I have a similar issue at work and this is what I do. Keep in mind that my issue is a have thousands of subfolders for a particular application's cache, but this could be applied to thousands of files.
If your sever is decently fast just use this to fork off multiple rms to work in parallel
Code:
for list in `ls -d /path/to/cache`; do rm -Rf ${list} & done
I find that much faster than a straight up rm, theres thousands of sub dirs and even more files within. This will do 200 gigs in less than a half hour. Without doing parallel it takes hours.
Notice the 'ls -d' meaning list directories, from what I gather in your case you'll want to just do 'ls' , or rather probably 'ls -1' to get a simple list of files.
Other options:
create and dedicate a partition to the cache, mount it to the cache dir. And then cron a script to unmount, rebuild filesystem, remount. I think killing the partition table and making one from scratch would be quicker than going through the filesystem and marking the inodes deleted. Many tiny files suck performance-wise with something like that.
Also you could use find with the exec:
Code:
find -name /path/to/cache/* find . -exec rm -Rf{} \;
Personally I think the for loop I provided is better than using find solely due to the parallel behavior. Find would still just iterate through a sequential list, I don't see the benefit doing that vs. rm -Rf /path/to/cache.
In the end I think destroying and recreating the partition would be the fastest, however the issue is there that you cannot do it live. Meaning how will the cache get written to in the time since the partition is offline and being rebuilt?
Similarly with my for loop this doesn't account for files that were created at the same time as this loop being run. That's no good, not really safe. What you would likely want to do is write a script that will do an ls or ls -d looking for any file modify dates older than a day, an hour, whatever. Then fork off a pool of rm's to remove only those files. This is the safest.