LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   permanently caching content of specific directory (https://www.linuxquestions.org/questions/linux-general-1/permanently-caching-content-of-specific-directory-739090/)

stanga 07-10-2009 01:05 AM

permanently caching content of specific directory
 
Is it possible to force linux (debian) to cache a directory's contents indefinitely and keep the cache updated to reflect changes in that directory?

Why?

I have a dir that needs to be listed pretty regularly (via ftp) but holds a lot of files. Anywhere from 20-100K files. Filesystem is ext3 and of course listing takes forever (minutes) unless the contents of the directory have been cached.

Is there a method to:

1. keep the directory i specify cached indefinitely
2. keep the cache up to date to the real time contents of the directory i specified.

Only thing i thought of was running some bash script that just constantly lists the contents of the directory thus ensuring it always being cached by the OS.

I'm hoping there is a tool or a simple way of reserving part of the system memory to keep this specific directory's contents cached at all times.

thanks

xeleema 07-10-2009 01:30 AM

Greetingz!

I'm going to assume there's a reason why 20,000 to 100,000 files need to be in one directory without any directory-based organization.

Is a "filelist.txt" file an option? You could just set up a cronjob to "ls -la > ./filelist.txt" and direct individuals to that. (I've seen similar sites do that.)

Is this FTP directory accessed by people or applications most of the time?

Also, what FTP server are you using?

stanga 07-10-2009 01:42 AM

Quote:

Originally Posted by xeleema (Post 3602976)
Greetingz!

I'm going to assume there's a reason why 20,000 to 100,000 files need to be in one directory without any directory-based organization.

believe it or not, that actually is part of the directory based organization.

Quote:


Is a "filelist.txt" file an option? You could just set up a cronjob to "ls -la > ./filelist.txt" and direct individuals to that. (I've seen similar sites do that.)

not an option unfortunately, mostly used by applications

Quote:


Also, what FTP server are you using?
The issue is really server independent. ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.

thanks for your input by the way (do you guys thank each other or is that assumed? :)

xeleema 07-10-2009 02:08 AM

FTP Servers
 
Hello again!

It's unfortunate that's a part of the directory structure, however some facets of some problems just can't be helped. :)

However, if applications are doing a directory listing, then we might be able to speed things along.

Quote:

The issue is really server independent. ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.
The downside is; An "ls" done at a terminal session versus an "ls" done from an FTP session are not one and the same in most cases.

FTP commands such as "ls", "cd", "dir", "put", "mput", etc are not called from a servers /usr/bin directory. They are functions of the FTP server daemon, which in turn executes the necessary system calls (like fopen()). However, it is rather dependent on which FTP daemon you're using.

For example, if you're using the stock FTP daemon from a RedHat based Linux distribution, you're probably using vsftpd. That particular FTP daemon has been touted as the "fastest" by many, however that's in data throughput and simultaneous users.

There's also ProFTPd, an FTP daemon that has a modular design, which allows for "plug-ins". It's configuration is very similar to Apache, so it can throw off those of us who aren't web developers.

Both of these FTP daemons are Free / Open Source, so there's no harm in giving them a shot.


But I digress. If you're looking to speed up directory listings of your data, you have to keep in mind that it's the FTP daemon itself that is responsible for reading/displaying the contents. So in order to speed up that process, you have to think "under" the FTP daemon. Down to the filesystem level.

Is the data hosted on a RAID5 array? (which offers fast reads, but okay-speed on writes.)

How about moving this directory to a RAM disk? (that's right, dedicate a chunk of your RAM to a filesystem.) This option would require a solution to sync-up the contents back to the main hard disk(s), but it would definitely impact the read-performance of your filesystem.

However, there's a bit of thinking to do "above" the FTP daemon; you're clients.

Are they connecting with Active or Passive sessions?

Active sessions yield a slightly faster connection, due to the dedicated data port they setup for their connections.

Passive connections strictly rely on port 21. This basically translates into a "half-duplex" connection, but can slip through restrictive firewalls a lot easier.

Quote:

thanks for your input by the way (do you guys thank each other or is that assumed?
It's always nice when someone clicks the little "thumbs-up" icon at the bottom of our posts. A dose of the warm-fuzzies encourages techi-ramblings more often :)

I hope I've given you a little FTP-based food for thought, and I'm sure someone else will see this thread and chime in. After all the FTP protocol has been around since the late 70's if I remember correctly, and only a few features have been added since then (some RFC from 2007 was the last time, I think).

All in all, it's still one of the simplest, fastest ways to transfer files.

Have a good one!

di11rod 07-13-2009 02:25 AM

Quote:

Originally Posted by stanga (Post 3602984)
ls command takes just as long to list from shell as via ftp. If i can make ls list faster (via cache or otherwise) that will fix the issue.

Just for grins, you might get better performance on your 'ls' command if you tighten it down with regular expressions and throw it in the background. On a multi-processor system, I believe this should farm the work across the processors. I know the bottleneck here is likely the hard drive access. But I've also seen certain shell commands blow up when given too many files to work with. Especially the 'rm' command.

So, you might create a script that looks something like this:

Code:

#!/bin/sh -vx
rm -f /home/username/foo.txt
rm -f /home/username/foo2.txt
ls /home/username/[a-z]* > /home/username/foo.txt &
ls /home/username/[A-Z]* >> /home/username/foo.txt &
cat /home/username/foo.txt | sort | uniq > /home/username/foo2.txt &

di11rod

xeleema 07-15-2009 01:35 AM

di11rod,
That's a great idea, however, the "ls" binary isn't thread-aware. As it's run, it'll scrape the directories and only display the requested contents. Typically the only speed increase with using Regular Expressions with it is the fact that the clockcycles spent on displaying or piping data aren't spent. It'll still touch everything in the directory listing.

There's only a few exceptions to this, one being the recursive option that "ls" packs with it. Recursive searches hit the whole filesystem in some cases. In others, it just does branch-searches (depends on if it's Ext2 or Ext3, or ReiserFS for example).

*Great* line of thinking, though!


All times are GMT -5. The time now is 10:43 PM.