permanently caching content of specific directory
Is it possible to force linux (debian) to cache a directory's contents indefinitely and keep the cache updated to reflect changes in that directory?
Why? I have a dir that needs to be listed pretty regularly (via ftp) but holds a lot of files. Anywhere from 20-100K files. Filesystem is ext3 and of course listing takes forever (minutes) unless the contents of the directory have been cached. Is there a method to: 1. keep the directory i specify cached indefinitely 2. keep the cache up to date to the real time contents of the directory i specified. Only thing i thought of was running some bash script that just constantly lists the contents of the directory thus ensuring it always being cached by the OS. I'm hoping there is a tool or a simple way of reserving part of the system memory to keep this specific directory's contents cached at all times. thanks |
Greetingz!
I'm going to assume there's a reason why 20,000 to 100,000 files need to be in one directory without any directory-based organization. Is a "filelist.txt" file an option? You could just set up a cronjob to "ls -la > ./filelist.txt" and direct individuals to that. (I've seen similar sites do that.) Is this FTP directory accessed by people or applications most of the time? Also, what FTP server are you using? |
Quote:
Quote:
Quote:
thanks for your input by the way (do you guys thank each other or is that assumed? :) |
FTP Servers
Hello again!
It's unfortunate that's a part of the directory structure, however some facets of some problems just can't be helped. :) However, if applications are doing a directory listing, then we might be able to speed things along. Quote:
FTP commands such as "ls", "cd", "dir", "put", "mput", etc are not called from a servers /usr/bin directory. They are functions of the FTP server daemon, which in turn executes the necessary system calls (like fopen()). However, it is rather dependent on which FTP daemon you're using. For example, if you're using the stock FTP daemon from a RedHat based Linux distribution, you're probably using vsftpd. That particular FTP daemon has been touted as the "fastest" by many, however that's in data throughput and simultaneous users. There's also ProFTPd, an FTP daemon that has a modular design, which allows for "plug-ins". It's configuration is very similar to Apache, so it can throw off those of us who aren't web developers. Both of these FTP daemons are Free / Open Source, so there's no harm in giving them a shot. But I digress. If you're looking to speed up directory listings of your data, you have to keep in mind that it's the FTP daemon itself that is responsible for reading/displaying the contents. So in order to speed up that process, you have to think "under" the FTP daemon. Down to the filesystem level. Is the data hosted on a RAID5 array? (which offers fast reads, but okay-speed on writes.) How about moving this directory to a RAM disk? (that's right, dedicate a chunk of your RAM to a filesystem.) This option would require a solution to sync-up the contents back to the main hard disk(s), but it would definitely impact the read-performance of your filesystem. However, there's a bit of thinking to do "above" the FTP daemon; you're clients. Are they connecting with Active or Passive sessions? Active sessions yield a slightly faster connection, due to the dedicated data port they setup for their connections. Passive connections strictly rely on port 21. This basically translates into a "half-duplex" connection, but can slip through restrictive firewalls a lot easier. Quote:
I hope I've given you a little FTP-based food for thought, and I'm sure someone else will see this thread and chime in. After all the FTP protocol has been around since the late 70's if I remember correctly, and only a few features have been added since then (some RFC from 2007 was the last time, I think). All in all, it's still one of the simplest, fastest ways to transfer files. Have a good one! |
Quote:
So, you might create a script that looks something like this: Code:
#!/bin/sh -vx |
di11rod,
That's a great idea, however, the "ls" binary isn't thread-aware. As it's run, it'll scrape the directories and only display the requested contents. Typically the only speed increase with using Regular Expressions with it is the fact that the clockcycles spent on displaying or piping data aren't spent. It'll still touch everything in the directory listing. There's only a few exceptions to this, one being the recursive option that "ls" packs with it. Recursive searches hit the whole filesystem in some cases. In others, it just does branch-searches (depends on if it's Ext2 or Ext3, or ReiserFS for example). *Great* line of thinking, though! |
All times are GMT -5. The time now is 10:43 PM. |