Imagine you’re developing a tool that needs to scan for file changes across thousands of project files. Retrieving file attributes efficiently becomes critical for such scenarios. In this article, I’ll demonstrate a technique to get file attributes that can achieve a surprising speedup of over 50+ times compared to standard Windows methods.
Let’s dive in and explore how we can achieve this.
This is a blog post made in collaboration with Bartlomiej Filipek from C++ stories. You can visit his blog here.
Table of Contents
The inspiration
The inspiration for this article came from a recent update for Visual Assist – a tool that heavily improves Visual Studio experience and productivity for C# and C++ developers.
In one of their blog post, they shared:
The initial parse is 10..15x faster!
After watching the webinar, I noticed some details about efficiently getting file attributes and I decided to give it a try on my machine. In other words I tried to recreate their results.
Disclaimer: Idera, the company behind Visual Assist, helped me write this post and sponsored it.
Understanding File Attribute Retrieval Methods on Windows
On Windows, there are at least a few options to check for a file change:
FindFirstFile[EX]
– with Basic, Standard and LargeFetch optionsGetFileAttributesEx
std::filesystem
GetFileInformationByHandleEx
Below, you can see some primary usage of each approach:
FindFirstFileEx
FindFirstFileEx
is a Windows API function that allows for efficient searching of directories. It retrieves information about files that match a specified file name pattern. The function can be used with different information levels, such as FindExInfoBasic
and FindExInfoStandard
, to control the amount of file information fetched.
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(), FindExInfoBasic, &findFileData, FindExSearchNameMatch, NULL, 0);
if (hFind != INVALID_HANDLE_VALUE) {
do {
// Process file information
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
Additionally you can also pass FIND_FIRST_EX_LARGE_FETCH
as the additional flag to indicate that the function should use a larger buffer which might bring some extra performance.
GetFileAttributesEx
GetFileAttributesEx
is another Windows API function that retrieves file attributes for a specified file or directory. Unlike FindFirstFileEx
, which is used for searching and listing files, GetFileAttributesEx
is typically used for retrieving attributes of a single file or directory.
WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
if (GetFileAttributesEx((directory + "\\" + fileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
// Process file attributes
}
GetFileInformationByHandleEx
GetFileInformationByHandleEx
is a low level routine that might be tricky to use, but gives us more control over the iteration. The main idea is to get a lerge buffer of data and read it on the application side, rather than rely on sometimes costly kernel/system calls.
Assuming you have a file open, which is a directory, you can iterate over its children in the following way:
while (true) {
if (!GetFileInformationByHandleEx(
hDir,
FileFullDirectoryInfo,
pInfo,
sizeof(buffer))) {
DWORD error = GetLastError();
if (error == ERROR_NO_MORE_FILES) {
break;
}
else {
std::wcerr << L"GetFileInformationByHandleEx failed (" << error << L")\n";
break;
}
}
do {
if (!(pInfo->FileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
FileInfo fileInfo;
fileInfo.fileName = std::wstring(pInfo->FileName, pInfo->FileNameLength / sizeof(WCHAR));
FILETIME ft{};
ft.dwLowDateTime = pInfo->LastWriteTime.LowPart;
ft.dwHighDateTime = pInfo->LastWriteTime.HighPart;
fileInfo.lastWriteTime = ft;
files.push_back(fileInfo);
}
pInfo = reinterpret_cast<FILE_FULL_DIR_INFO*>(
reinterpret_cast<BYTE*>(pInfo) + pInfo->NextEntryOffset);
} while (pInfo->NextEntryOffset != 0);
}
std::filesystem
Introduced in C++17, the std::filesystem
library provides a modern and portable way to interact with the file system. It includes functions for file attribute retrieval, directory iteration, and other common file system operations.
for (const auto& entry : fs::directory_iterator(directory)) {
if (entry.is_regular_file()) {
// Process file attributes
auto ftime = fs:last_write_time(entry);
...
}
}
The Benchmark
To evaluate the performance of different file attribute retrieval methods, I developed a small benchmark. This application measures the time taken by each method to retrieve file attributes for N number of files in a specified directory.
Here’s a rough overview of the code:
The FileInfo
struct stores the file name and last write time.
struct FileInfo {
std::wstring fileName;
std::variant<FILETIME, std::filesystem::file_time_type> lastWriteTime;
};
Each retrieval technique will have to go over a directory and build a vector of FileInfo
objects.
BenchmarkFindFirstFileEx
void BenchmarkFindFirstFileEx(const std::string& directory,
std::vector<FileInfo>& files,
FINDEX_INFO_LEVELS infoLevel)
{
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(),
infoLevel,
&findFileData,
FindExSearchNameMatch, NULL, 0);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "FindFirstFileEx failed ("
<< GetLastError() << ")\n";
return;
}
do {
if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
FileInfo fileInfo;
fileInfo.fileName = findFileData.cFileName;
fileInfo.lastWriteTime = findFileData.ftLastWriteTime;
files.push_back(fileInfo);
}
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
BenchmarkGetFileAttributesEx
void BenchmarkGetFileAttributesEx(const std::string& directory,
std::vector<FileInfo>& files)
{
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFile((directory + "\\*").c_str(),
&findFileData);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "FindFirstFile failed ("
<< GetLastError() << ")\n";
return;
}
do {
if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
if (GetFileAttributesEx((directory + "\\" + findFileData.cFileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
FileInfo fileInfo;
fileInfo.fileName = findFileData.cFileName;
fileInfo.lastWriteTime = fileAttributeData.ftLastWriteTime;
files.push_back(fileInfo);
}
}
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
BenchmarkStdFilesystem
And the last one, the most portable technique:
void BenchmarkStdFilesystem(const std::string& directory,
std::vector<FileInfo>& files)
{
for (const auto& entry : std::filesystem::directory_iterator(directory)) {
if (entry.is_regular_file()) {
FileInfo fileInfo;
fileInfo.fileName = entry.path().filename().string();
FILETIME ft{};
ft.dwLowDateTime = pInfo->LastWriteTime.LowPart;
ft.dwHighDateTime = pInfo->LastWriteTime.HighPart;
fileInfo.lastWriteTime = ft;
files.push_back(fileInfo);
}
}
}
BenchmarkGetFileInformationByHandleEx
void BenchmarkGetFileInformationByHandleEx(const std::wstring& directory, std::vector<FileInfo>& files) {
HANDLE hDir = CreateFileW(
directory.c_str(),
GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL,
OPEN_EXISTING,
FILE_FLAG_BACKUP_SEMANTICS,
NULL
);
if (hDir == INVALID_HANDLE_VALUE) {
std::wcerr << L"CreateFile failed (" << GetLastError() << L")\n";
return;
}
constexpr DWORD BufferSize = 64 * 1024;
uint8_t buffer[BufferSize];
FILE_FULL_DIR_INFO* pInfo = reinterpret_cast<FILE_FULL_DIR_INFO*>(buffer);
while (true) {
if (!GetFileInformationByHandleEx(
hDir,
FileFullDirectoryInfo,
pInfo,
sizeof(buffer))) {
DWORD error = GetLastError();
if (error == ERROR_NO_MORE_FILES) {
break;
}
else {
std::wcerr << L"GetFileInformationByHandleEx failed (" << error << L")\n";
break;
}
}
do {
if (!(pInfo->FileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
FileInfo fileInfo;
fileInfo.fileName = std::wstring(pInfo->FileName, pInfo->FileNameLength / sizeof(WCHAR));
FILETIME ft{};
ft.dwLowDateTime = pInfo->LastWriteTime.LowPart;
ft.dwHighDateTime = pInfo->LastWriteTime.HighPart;
fileInfo.lastWriteTime = ft;
files.push_back(fileInfo);
}
pInfo = reinterpret_cast<FILE_FULL_DIR_INFO*>(
reinterpret_cast<BYTE*>(pInfo) + pInfo->NextEntryOffset);
} while (pInfo->NextEntryOffset != 0);
}
CloseHandle(hDir);
}
The Main Function
The main
function sets up the benchmarking environment, runs the benchmarks, and prints the results.
std::wstring directory = argv[1];
const auto arg2 = argc > 2 ? std::wstring_view(argv[2]) : std::wstring_view{};
std::vector<std::pair<std::wstring, std::function<void(std::vector<FileInfo>&)>>> benchmarks = {
{L"FindFirstFileEx (Basic)", [&](std::vector<FileInfo>& files) {
BenchmarkFindFirstFileEx(directory, files, FindExInfoBasic, 0);
}},
{L"FindFirstFileEx (Standard)", [&](std::vector<FileInfo>& files) {
BenchmarkFindFirstFileEx(directory, files, FindExInfoStandard, 0);
}},
{L"FindFirstFileEx (Large Fetch)", [&](std::vector<FileInfo>& files) { BenchmarkFindFirstFileEx(directory, files, FindExInfoStandard, FIND_FIRST_EX_LARGE_FETCH);
}},
{L"GetFileAttributesEx", [&](std::vector<FileInfo>& files) {
BenchmarkGetFileAttributesEx(directory, files);
}},
{L"std::filesystem", [&](std::vector<FileInfo>& files) {
BenchmarkStdFilesystem(directory, files);
}},
{L"GetFileInformationByHandleEx", [&](std::vector<FileInfo>& files) {
BenchmarkGetFileInformationByHandleEx(directory, files);
}}
};
std::vector<std::pair<std::wstring, double>> results;
for (const auto& benchmark : benchmarks) {
std::vector<FileInfo> files;
files.reserve(2000); // Reserve space outside the timing measurement
auto start = std::chrono::high_resolution_clock::now();
benchmark.second(files);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end - start;
results.emplace_back(benchmark.first, elapsed.count());
}
PrintResultsTable(results);
Performance Results
To measure the performance of each file attribute retrieval method, I executed benchmarks on a directory containing 1000, 2000 or 5000 random text files. The tests were performed on a laptop equipped with an Intel i7 4720HQ CPU and an SSD. I measured the time taken by each method and compared the results to determine the fastest approach.
Each test run consisted of two executions: the first with uncached file attributes and the second likely benefiting from system-level caching.
The speedup factor is the factor of the current result compared to the slowest technique in a given run.
1000 files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0014831000 162.868
FindFirstFileEx (Standard) 0.0014817000 163.022
FindFirstFileEx (Large Fetch) 0.0011792000 204.842
GetFileAttributesEx 0.2415497000 1.000
std::filesystem 0.0609313000 3.964
GetFileInformationByHandleEx 0.0044168000 54.689
// second run:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0013805000 44.947
FindFirstFileEx (Standard) 0.0011310000 54.863
FindFirstFileEx (Large Fetch) 0.0009071000 68.404
GetFileAttributesEx 0.0616772000 1.006
std::filesystem 0.0620496000 1.000
GetFileInformationByHandleEx 0.0025246000 24.578
Directory with 2000 files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0014455000 150.287
FindFirstFileEx (Standard) 0.0015029000 144.547
FindFirstFileEx (Large Fetch) 0.0012086000 179.745
GetFileAttributesEx 0.2172402000 1.000
std::filesystem 0.0609186000 3.566
GetFileInformationByHandleEx 0.0025069000 86.657
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0012020000 50.908
FindFirstFileEx (Standard) 0.0011614000 52.688
FindFirstFileEx (Large Fetch) 0.0008887000 68.856
GetFileAttributesEx 0.0611920000 1.000
std::filesystem 0.0611760000 1.000
GetFileInformationByHandleEx 0.0025835000 23.686
Directory with 5000 random, small text files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0077623000 84.975
FindFirstFileEx (Standard) 0.0828258000 7.964
FindFirstFileEx (Large Fetch) 0.0144611000 45.612
GetFileAttributesEx 0.6595977000 1.000
std::filesystem 0.3022779000 2.182
GetFileInformationByHandleEx 0.0051569000 127.906
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0069814000 43.844
FindFirstFileEx (Standard) 0.0148472000 20.616
FindFirstFileEx (Large Fetch) 0.0140663000 21.761
GetFileAttributesEx 0.3060932000 1.000
std::filesystem 0.3011346000 1.016
GetFileInformationByHandleEx 0.0051614000 59.304
The results consistently showed that FindFirstFileEx
with the Standard
flag was the fastest method in uncached scenarios, offering speedups up to 129x compared to GetFileAttributesEx
. However, in cached scenarios, FindFirstFileEx
(Basic and Standard) achieved over 50x speedup improvements. The parameters for “Large Fetch” seems to increase the performance.
For the directory with 2000 files, FindFirstFileEx
(Basic) demonstrated a speedup factor of over 179x in the first run and went down to 68 in the second run. In the directory with 5000 files, we can see that GetFileInformationByHandleEx
takes crown and acheives 59x speedup, while other techniques reaches 43x max. Notably, std::filesystem
performed on par with GetFileAttributesEx
.
Further Techniques
Getting file attributes is only part of the story, and while important, they may contribute to only a small portion of the overall performance for the whole project. The Visual Assist team, who contributed to this article, improved their initial parse time performance by avoiding GetFileAttributes[Ex]
using the same techniques as this article. But Visual Assist also improved performance through further techniques. My simple benchmark showed 50x speedups, but we cannot directly compare it with the final Visual Assist, as the tool does many more things with files.
The main item being optimised was the initial parse, where VA builds a symbol database when a project is opened for the first time. This involves parsing all code and all headers. They decided that it’s a reasonable assumption that headers won’t change while a project is being loaded, and so the file access is cached during the initial parse, avoiding the filesystem entirely. (Changes after a project has been parsed the first time are, of course, still caught.) The combination of switching to a much faster method for checking filetimes and then avoiding file IO completely contributed to the up-to-15-times-faster performance improvement they saw in version 2024.1 at the beginning of this year.
Read further details on their blog Visual Assist 2024.1 release post – January 2024 and Catching up with VA: Our most recent performance updates – Tomato Soup.
Summary
In the text, we went through a benchmark that compares several techniques for fetching file attributes. In short, it’s best to gather attributes at the same time as you iterate through the directory – using FindFirstFileEx
or via GetFileInformationByHandleEx
. So if you want to do this operation hundreds of times, it’s best to measure time and choose the best technique. What’s more, if you expect to have lots of files in a directory it’s good to check techniques offering larger buffers.
The benchmark also showed one feature: while C++17 and its filesystem
library offer a robust and standardized way to work with files and directories, it can be limited in terms of performance. In many cases, if you need super optimal performance, you need to open the hood and work with the specific operating system API.
Back to you
- Do you use std::filesystem for tasks involving hundreds of files?
- Do you know other techniques that offer greater performance when working with files?
Share your comments below. And if you’re using C++, you can also download and try Visual Assist yourself for 30 days for free.