LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-29-2007, 01:46 PM   #1
Undertoad
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Rep: Reputation: 0
Huge directory files and performance


Hiya, I run a few systems that have directories containing 420,000 files. A directory file itself may be as large as 8M.

"time ls" on the largest directory:

real 1m9.674s
user 0m5.700s
sys 0m2.630s

Whereas "time ls" on a smaller directory:

real 0m0.004s
user 0m0.000s
sys 0m0.000s

I could be a Department Hero (TM) if I found a way to improve the performance of some of our apps, because they run on hundreds of machines. I'm thinking these huge directories aren't helping any. Most of these boxes are running RedHat 7.2 because there has been a custom kernel written (long ago) and images created for these identical systems.
 
Old 04-29-2007, 02:06 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Grouping the files into subdirectories would help, but that would require changing/reconfiguring the applications if they are expecting the files in a fixed location.

Deleting/archiving files that are not actively needed would be an even better idea.
 
Old 04-29-2007, 02:59 PM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
You could break up the directory into subdirectories. Think of the name of the subdirectory itself as conveying information. For example, you could have a hash function based on the file name and save each file in the subdirectory whose name matches the hash value. This is one technique of organizing very large lists in programs to reduce the amount of time a program takes to do insertions, deletions and sorting. It should help reduce the amount of time in finding a particular file as well. However, if does mean that the hash function will have to be added to your program(s) to retrieve and save files. You may also want to write your own script utilities to list or delete files as well. This would be the Departmental Hero part.
 
Old 04-29-2007, 03:16 PM   #4
Undertoad
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Original Poster
Rep: Reputation: 0
More research - they say ReiserFS may be better at handling huge directories;

Some new hash function has been added to ext3 recently (as of 2.6 kernel?) but I am still looking into it.

Breaking down the directories so they hash in some way is a good way I agree! It may be the only way, but I'm just checking to see if there's a quicker fix.

Last edited by Undertoad; 04-29-2007 at 03:17 PM.
 
Old 04-29-2007, 03:28 PM   #5
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Years ago, my boss had a problem with her computer after installing some software. I found out that it was caused by a bad installer program filling up the inf directory till it hit the max 30,000 entries that the Win98's fat32 filesystem allows.
 
Old 05-01-2007, 10:09 AM   #6
Undertoad
LQ Newbie
 
Registered: Apr 2007
Posts: 4

Original Poster
Rep: Reputation: 0
I think I've found it!

Before a certain time, Redhat didn't enable the dir_index feature of ext3 by default. Therefore more recent releases have this performance enhancement but my old 7.2 boxes don't, although their kernel supports it.

dir_index adds a b-tree hash algorithm to directories. I haven't added it to any systems yet, but you use tune2fs -O to set it, and then do an e2fsck -D to convert the directories.

If anyone has any experience doing this, and might have a hint as to what kinds of performance improvements there are in it, add it to the thread? Thanks!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Huge performance issues with Maya 8.0 on Feisty Yoda3114 Linux - Software 1 04-10-2007 09:21 AM
Deleting a HUGE Directory teamchachi Linux - Software 1 03-07-2007 04:28 PM
Huge syslog files ! MikeAtVillage Linux - General 8 05-03-2006 05:56 AM
How to deal with huge swp files such as .store.log.2.swp from squid directory? Niceman2005 Linux - General 2 11-01-2005 07:25 AM
Huge log files altor Linux - Newbie 4 09-03-2003 08:40 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration