LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-28-2011, 11:33 AM   #1
jon_smark
LQ Newbie
 
Registered: Feb 2011
Posts: 2

Rep: Reputation: 0
Scaling of directory entry lookup


I need to create and access a very large number of directories (and by large I mean millions). Each directory's name consists only of numbers, which are incremented every time a new directory is created (so there will be directory 1, 2, and so on).

Of course I could just dump all these directories under the same parent directory, but I reckon I would run into filesystem limits. Moreover, I presume that entry lookup is not a O(1) operation, which means that lookup does not scale well.

One solution is to use some sort of prefix tree for storing the data. In this scheme, the data for directory "1234" would actually be stored in "/1/2/3/4".
This solution has the advantage that each subtree never has more than 10 entries, but the disadvantage of requiring as many individual lookups as the length of the path.

There are also intermediate solutions: using a maximum of 100 entries per directory, "1234" would become "/12/34", per example.

In order to choose the best scheme, I must know more about the scalability of directory lookup under Linux. What's the maximum reasonable number of entries for a directory before lookup becomes too slow? Does someone know exactly what are the limitations of the algorithm used for directory lookup?

I understand this question may be a bit too technical for this forum. If that is indeed the case, what would be the best place to put it instead?

Thanks in advance,
Jon
 
Old 02-28-2011, 01:30 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by jon_smark View Post
In order to choose the best scheme, I must know more about the scalability of directory lookup under Linux. What's the maximum reasonable number of entries for a directory before lookup becomes too slow? Does someone know exactly what are the limitations of the algorithm used for directory lookup?
It depends on the filesystem. Have you already decided which one to use?

If you use ext4, you can use tune2fs to enable dir_index (hashed b-trees) to speed up lookup for large directories. I believe with ext4 and dir_index enabled a two-level design is probably optimal: say 1000 to 60000 subdirectories, and the files (or actual directories) roughly evenly distributed in the subdirectories. You should be fine with that to at least 360 million files/directories. Note that I don't think ext4 has any limits here; you could put them all in the same directory. (ls and shells would be pretty unhappy, though.)

If you intend to create a multi-terabyte filesystems, XFS is a better choice. I believe it behaves roughly the same.

I'd recommend reading the linux-fsdevel mailing list archives for details.

Last edited by Nominal Animal; 02-28-2011 at 01:44 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
3. What /etc/exports entry would export a directory named /nfs to all clients on the 1bigboy74 Linux - Newbie 3 01-22-2011 01:24 PM
[SOLVED] fstab entry for NFS mount of directory with a space bkorb Linux - Server 5 05-26-2010 10:40 AM
[SOLVED] Making a directory entry pointing to (file+offset) grchere Linux - General 1 05-05-2010 10:58 PM
reverse lookup entry in DNS configuration using BIND bzlaskar Linux - Server 1 05-14-2007 03:49 PM
strange directory entry in Desktop davsnotn Linux - Security 5 06-08-2006 12:14 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration