LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Quicker way to delete folders than rm -r folder_name (https://www.linuxquestions.org/questions/linux-newbie-8/quicker-way-to-delete-folders-than-rm-r-folder_name-680598/)

nbdr 11-02-2008 03:31 AM

Quicker way to delete folders than rm -r folder_name
 
Hello,

I need to delete 100,000 of files in a massive directory tree daily.

Using rm -r folder_name takes hours.

Is there another linux command that deletes folder trees faster?

Thanks,
nbdr

ilikejam 11-02-2008 06:06 AM

Hi.

Nope. rm's as good as it gets. You might want to look into using a different filesystem, though.

Dave.

syg00 11-02-2008 06:20 AM

Which can be read (at least) two ways.
Me, I might put them in a separate partition, and just do a mkfs on it. Bet it wouldn't take too long ...

pixellany 11-02-2008 07:37 AM

Quote:

Originally Posted by nbdr (Post 3328915)
Hello,

I need to delete 100,000 of files in a massive directory tree daily.

Using rm -r folder_name takes hours.

Is there another linux command that deletes folder trees faster?

Thanks,
nbdr

Welcome to LQ!!

What you need is some way of defining which folders to delete. This could be something simple like having them all in one place, or it could be a unique string in the filename or extension.

Take a look at the "find" command, with all its options.

nbdr 11-02-2008 12:21 PM

Quote:

Originally Posted by pixellany (Post 3329057)
Welcome to LQ!!

What you need is some way of defining which folders to delete. This could be something simple like having them all in one place, or it could be a unique string in the filename or extension.

Take a look at the "find" command, with all its options.


Thanks for the answers.

I know which folders to delete. they are all under 'cache' folder. What do I do then that is faster than rm -r cache/ ?

-Nbdr

i92guboj 11-02-2008 12:26 PM

Nothing.

If rm could be faster, it would be faster. It has had many decades to improve.

Most of the stuff is done by the fs driver, so, the only way to improve the performance would be to look into the file system (change filesystem, tune it, change options when formatting, etc. etc. etc.).

john test 11-02-2008 12:50 PM

probably
Code:

rm -rf /cache
would be somewhat faster.

Quakeboy02 11-02-2008 12:54 PM

Quote:

Originally Posted by nbdr (Post 3328915)
I need to delete 100,000 of files in a massive directory tree daily.

If it's just junk, why are you even saving it? What is it, anyway, if I can ask?

H_TeXMeX_H 11-02-2008 01:11 PM

What filesystem are you using ? If it's ext3 then I can see why it takes hours, I bet the same would only take minutes with JFS (fastest delete speed) or XFS.

i92guboj 11-02-2008 02:12 PM

Quote:

Originally Posted by H_TeXMeX_H (Post 3329286)
What filesystem are you using ? If it's ext3 then I can see why it takes hours, I bet the same would only take minutes with JFS (fastest delete speed) or XFS.

That's not my experience with ext3 at all.

Code:

$ time for dir1 in $(seq 1 1000); do mkdir $dir1; for dir2 in $(seq 1 100); do mkdir $dir1/$dir2; done; done

real    5m14.906s
user    1m19.725s
sys    2m56.671s
$ time rm -rf *

real    0m9.218s
user    0m0.584s
sys    0m6.224s

Under ten seconds for 100,000 directories. This is a sempron 3000+ (relatively old machine) with a sata disk (not sata 2). Filesystem is ext3, it's formated with -O dir_index, though. Creation of directories is not that fast, however that's to be expected.

Out of curiosity, I tested a loopback fs formated and mounted with the standard options (no dir_index), just to be fair. The results are very similar.

Code:

real    5m55.984s
user    1m44.103s
sys    3m1.987s
$ time rm -rf *

real    0m10.178s
user    0m0.583s
sys    0m7.564s

A really small difference.

From my experience, I know that ext3 is a very stable and fast filesystem overall even if people usually don't like to admit it because of I don't know what reason. Sure that some other fs's do X thing better, but they also do other things *much* worse. I find that ext3 does everything adequately.



If the OP really find that deleting 100,000 files takes that long, there are a number of probably causes.
  • Defective or experimental fs (not ext3), like reiser4 or ext4. I don't know if reiserfs (3.x) can have problems with this, but I know from first hand that it does have serious problems with fragmentation.
  • Defective hardware, look on the dmesg output for I/O errors when doing fs operations.
  • Your cpu is being hogged by something else. Check top or htop.

There might be other possible problems. But rm is not one of them.

chrism01 11-02-2008 08:10 PM

As mentioned, rm is pretty quick. What might be slowing it down is updating the dir files as it goes. Try re-mounting with -noatime.
Alternately, as said, make it a separate partition and use mkfs.

syg00 11-02-2008 09:38 PM

Hadn't seen post by i92guboj - I did some tests too. I just created 100000 copies of a small (few hundred bytes) file. Took just less than 10 and a half minutes.
Rebooted and "rm-rf ..." - less than 10 seconds.

Hardware RAID5 on an old idle quad (P-III based) Xeon server. EXT3 mounted noatime, nodiratime - because I always have them that way.

nbdr 11-03-2008 12:27 AM

Thank you all
 
Its a cache of a website that hosted on a shared server.

I think that the files are stored on a storage cluster. Don't have any control of the file system or other stuff. The problem is that I exceed the 500,000 files limit every few days and delete manually until I optimize the caching.

Thanks again,
Nbdr

jay73 11-03-2008 12:31 AM

So if you do that daily, wouldn't it make sense to set up a cron job or something that takes care of it for you?

i92guboj 11-03-2008 01:32 AM

Quote:

Originally Posted by jay73 (Post 3329741)
So if you do that daily, wouldn't it make sense to set up a cron job or something that takes care of it for you?

++

That's the way to go. Just create a cron job. He might consider using an higher niceness so it doesn't hit the cpu so badly, though, sincerely, in a cluster I don't think that cpu is the problem. I am rather inclined to think that's something to do with the fs or the hardware.


All times are GMT -5. The time now is 09:40 AM.