Quote:
Originally Posted by jon_smark
In order to choose the best scheme, I must know more about the scalability of directory lookup under Linux. What's the maximum reasonable number of entries for a directory before lookup becomes too slow? Does someone know exactly what are the limitations of the algorithm used for directory lookup?
|
It depends on the filesystem. Have you already decided which one to use?
If you use ext4, you can use
tune2fs to enable
dir_index (hashed b-trees) to speed up lookup for large directories. I believe with ext4 and dir_index enabled a two-level design is probably optimal: say 1000 to 60000 subdirectories, and the files (or actual directories) roughly evenly distributed in the subdirectories. You should be fine with that to at least 360 million files/directories. Note that I don't think ext4 has any limits here; you could put them all in the same directory. (ls and shells would be pretty unhappy, though.)
If you intend to create a multi-terabyte filesystems, XFS is a better choice. I believe it behaves roughly the same.
I'd recommend reading the
linux-fsdevel mailing list archives for details.