Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hiya, I run a few systems that have directories containing 420,000 files. A directory file itself may be as large as 8M.
"time ls" on the largest directory:
real 1m9.674s
user 0m5.700s
sys 0m2.630s
Whereas "time ls" on a smaller directory:
real 0m0.004s
user 0m0.000s
sys 0m0.000s
I could be a Department Hero (TM) if I found a way to improve the performance of some of our apps, because they run on hundreds of machines. I'm thinking these huge directories aren't helping any. Most of these boxes are running RedHat 7.2 because there has been a custom kernel written (long ago) and images created for these identical systems.
Grouping the files into subdirectories would help, but that would require changing/reconfiguring the applications if they are expecting the files in a fixed location.
Deleting/archiving files that are not actively needed would be an even better idea.
You could break up the directory into subdirectories. Think of the name of the subdirectory itself as conveying information. For example, you could have a hash function based on the file name and save each file in the subdirectory whose name matches the hash value. This is one technique of organizing very large lists in programs to reduce the amount of time a program takes to do insertions, deletions and sorting. It should help reduce the amount of time in finding a particular file as well. However, if does mean that the hash function will have to be added to your program(s) to retrieve and save files. You may also want to write your own script utilities to list or delete files as well. This would be the Departmental Hero part.
More research - they say ReiserFS may be better at handling huge directories;
Some new hash function has been added to ext3 recently (as of 2.6 kernel?) but I am still looking into it.
Breaking down the directories so they hash in some way is a good way I agree! It may be the only way, but I'm just checking to see if there's a quicker fix.
Years ago, my boss had a problem with her computer after installing some software. I found out that it was caused by a bad installer program filling up the inf directory till it hit the max 30,000 entries that the Win98's fat32 filesystem allows.
Before a certain time, Redhat didn't enable the dir_index feature of ext3 by default. Therefore more recent releases have this performance enhancement but my old 7.2 boxes don't, although their kernel supports it.
dir_index adds a b-tree hash algorithm to directories. I haven't added it to any systems yet, but you use tune2fs -O to set it, and then do an e2fsck -D to convert the directories.
If anyone has any experience doing this, and might have a hint as to what kinds of performance improvements there are in it, add it to the thread? Thanks!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.