LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-10-2009, 01:05 AM   #1
stanga
LQ Newbie
 
Registered: Jul 2009
Posts: 2

Rep: Reputation: 0
permanently caching content of specific directory


Is it possible to force linux (debian) to cache a directory's contents indefinitely and keep the cache updated to reflect changes in that directory?

Why?

I have a dir that needs to be listed pretty regularly (via ftp) but holds a lot of files. Anywhere from 20-100K files. Filesystem is ext3 and of course listing takes forever (minutes) unless the contents of the directory have been cached.

Is there a method to:

1. keep the directory i specify cached indefinitely
2. keep the cache up to date to the real time contents of the directory i specified.

Only thing i thought of was running some bash script that just constantly lists the contents of the directory thus ensuring it always being cached by the OS.

I'm hoping there is a tool or a simple way of reserving part of the system memory to keep this specific directory's contents cached at all times.

thanks
 
Old 07-10-2009, 01:30 AM   #2
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
Greetingz!

I'm going to assume there's a reason why 20,000 to 100,000 files need to be in one directory without any directory-based organization.

Is a "filelist.txt" file an option? You could just set up a cronjob to "ls -la > ./filelist.txt" and direct individuals to that. (I've seen similar sites do that.)

Is this FTP directory accessed by people or applications most of the time?

Also, what FTP server are you using?

Last edited by xeleema; 07-10-2009 at 01:32 AM.
 
Old 07-10-2009, 01:42 AM   #3
stanga
LQ Newbie
 
Registered: Jul 2009
Posts: 2

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by xeleema View Post
Greetingz!

I'm going to assume there's a reason why 20,000 to 100,000 files need to be in one directory without any directory-based organization.
believe it or not, that actually is part of the directory based organization.

Quote:

Is a "filelist.txt" file an option? You could just set up a cronjob to "ls -la > ./filelist.txt" and direct individuals to that. (I've seen similar sites do that.)
not an option unfortunately, mostly used by applications

Quote:

Also, what FTP server are you using?
The issue is really server independent. ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.

thanks for your input by the way (do you guys thank each other or is that assumed?
 
Old 07-10-2009, 02:08 AM   #4
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
FTP Servers

Hello again!

It's unfortunate that's a part of the directory structure, however some facets of some problems just can't be helped.

However, if applications are doing a directory listing, then we might be able to speed things along.

Quote:
The issue is really server independent. ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.
The downside is; An "ls" done at a terminal session versus an "ls" done from an FTP session are not one and the same in most cases.

FTP commands such as "ls", "cd", "dir", "put", "mput", etc are not called from a servers /usr/bin directory. They are functions of the FTP server daemon, which in turn executes the necessary system calls (like fopen()). However, it is rather dependent on which FTP daemon you're using.

For example, if you're using the stock FTP daemon from a RedHat based Linux distribution, you're probably using vsftpd. That particular FTP daemon has been touted as the "fastest" by many, however that's in data throughput and simultaneous users.

There's also ProFTPd, an FTP daemon that has a modular design, which allows for "plug-ins". It's configuration is very similar to Apache, so it can throw off those of us who aren't web developers.

Both of these FTP daemons are Free / Open Source, so there's no harm in giving them a shot.


But I digress. If you're looking to speed up directory listings of your data, you have to keep in mind that it's the FTP daemon itself that is responsible for reading/displaying the contents. So in order to speed up that process, you have to think "under" the FTP daemon. Down to the filesystem level.

Is the data hosted on a RAID5 array? (which offers fast reads, but okay-speed on writes.)

How about moving this directory to a RAM disk? (that's right, dedicate a chunk of your RAM to a filesystem.) This option would require a solution to sync-up the contents back to the main hard disk(s), but it would definitely impact the read-performance of your filesystem.

However, there's a bit of thinking to do "above" the FTP daemon; you're clients.

Are they connecting with Active or Passive sessions?

Active sessions yield a slightly faster connection, due to the dedicated data port they setup for their connections.

Passive connections strictly rely on port 21. This basically translates into a "half-duplex" connection, but can slip through restrictive firewalls a lot easier.

Quote:
thanks for your input by the way (do you guys thank each other or is that assumed?
It's always nice when someone clicks the little "thumbs-up" icon at the bottom of our posts. A dose of the warm-fuzzies encourages techi-ramblings more often

I hope I've given you a little FTP-based food for thought, and I'm sure someone else will see this thread and chime in. After all the FTP protocol has been around since the late 70's if I remember correctly, and only a few features have been added since then (some RFC from 2007 was the last time, I think).

All in all, it's still one of the simplest, fastest ways to transfer files.

Have a good one!
 
Old 07-13-2009, 02:25 AM   #5
di11rod
Member
 
Registered: Jan 2004
Location: Austin, TEXAS
Distribution: CentOS 6.5
Posts: 211

Rep: Reputation: 32
Quote:
Originally Posted by stanga View Post
ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.
Just for grins, you might get better performance on your 'ls' command if you tighten it down with regular expressions and throw it in the background. On a multi-processor system, I believe this should farm the work across the processors. I know the bottleneck here is likely the hard drive access. But I've also seen certain shell commands blow up when given too many files to work with. Especially the 'rm' command.

So, you might create a script that looks something like this:

Code:
#!/bin/sh -vx
rm -f /home/username/foo.txt
rm -f /home/username/foo2.txt
ls /home/username/[a-z]* > /home/username/foo.txt &
ls /home/username/[A-Z]* >> /home/username/foo.txt &
cat /home/username/foo.txt | sort | uniq > /home/username/foo2.txt &
di11rod
 
Old 07-15-2009, 01:35 AM   #6
xeleema
Member
 
Registered: Aug 2005
Location: D.i.t.h.o, Texas
Distribution: Slackware 13.x, rhel3/5, Solaris 8-10(sparc), HP-UX 11.x (pa-risc)
Posts: 988
Blog Entries: 4

Rep: Reputation: 254Reputation: 254Reputation: 254
di11rod,
That's a great idea, however, the "ls" binary isn't thread-aware. As it's run, it'll scrape the directories and only display the requested contents. Typically the only speed increase with using Regular Expressions with it is the fact that the clockcycles spent on displaying or piping data aren't spent. It'll still touch everything in the directory listing.

There's only a few exceptions to this, one being the recursive option that "ls" packs with it. Recursive searches hit the whole filesystem in some cases. In others, it just does branch-searches (depends on if it's Ext2 or Ext3, or ReiserFS for example).

*Great* line of thinking, though!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to copy specific content of a dynamic file to another flie abhi1 Linux - Software 3 04-02-2009 06:15 AM
Caching Directory Entries binarybob0001 Programming 1 08-14-2008 07:19 AM
How to provision Data Storage Capacity for content caching rsean LQ Articles Discussion 0 07-30-2007 02:06 PM
script to grab html content from between specific tags sonicthehedgehog Programming 6 01-30-2007 01:14 PM
Searching a specific directory for a specific extension? RoaCh Of DisCor Linux - Newbie 3 08-13-2005 03:28 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 07:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration