Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
08-22-2015, 09:18 PM
|
#1
|
Member
Registered: Jun 2007
Posts: 66
Rep:
|
Deleting a Directory with millions of files and sub directories
Greetings everyone
My Laptop was reporting limited storage space. So I installed Bleachbit and executed with root privileges. I have enabled all options including WIPE. It ran overnight for 12 hours and application became unresponsive. So I have pressed the power button for 10 seconds it power down the laptop.
Upon booting I saw a directory T6o1L9lGg- which contains huge number of files. I am unable to list using ls command
tried to delete using sudo rm -r T6o1L9lGg-
Not working. it runs for ever.
Tried the following command
mkdir empty
sudo rsync -a --delete ./empty/ ./T6o1L9IGg-/
This also not working ran over two hours and continue to run.
So I came across the following thread
http://www.stevekamerman.com/2008/03...#comment-16588
Here a David Villegas posting
Code:
David Villegas
September 1st, 2013 at 1:03 am
After a nightmare on a server with free space but out of inodes, because of spam attack on a misconfigured postfix… i found this blog post!, in my case was 8 million files on /var/spool/postfix/maildrop folder.
The only useful formula that works on my case was the Mikhus script on php.
find, rsync, rm fail with that ammount of files
but i wrote a similar c++ script, to speed up the delete.
HOW:
just copy and paste this code to /var/spool/postfix/mydel.c
and compile with g++ -o mydelexec mydel.c
CODE:
//COMPILE WITH: g++ -o mydelexe mydel.c
#include
#include
int main() {
struct dirent *d;
DIR *dir;
char buf[256];
int i;
printf(“***mydelexe***\n”);
dir = opendir(“maildrop”);
while( d = readdir(dir) ) {
i++;
if(i%100==0) {
printf(“%s %i deleted!\n”,d->d_name,i);
}
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
remove(buf);
}
return 0;
}
This c do the same of the php, but a bit faster… for my 8 million files gone after few hours
After compiling I got the following
Code:
rama@develop ~ $ g++ -o mydelexec mydel.c
mydel.c:2:9: error: #include expects "FILENAME" or <FILENAME>
#include
^
mydel.c:3:9: error: #include expects "FILENAME" or <FILENAME>
#include
^
mydel.c:13:2: error: stray ‘\342’ in program
printf(“***mydelexe***\n”);
^
mydel.c:13:2: error: stray ‘\200’ in program
mydel.c:13:2: error: stray ‘\234’ in program
mydel.c:13:2: error: stray ‘\’ in program
mydel.c:13:2: error: stray ‘\342’ in program
mydel.c:13:2: error: stray ‘\200’ in program
mydel.c:13:2: error: stray ‘\235’ in program
mydel.c:14:2: error: stray ‘\342’ in program
dir = opendir(“maildrop”);
^
mydel.c:14:2: error: stray ‘\200’ in program
mydel.c:14:2: error: stray ‘\234’ in program
mydel.c:14:2: error: stray ‘\342’ in program
mydel.c:14:2: error: stray ‘\200’ in program
mydel.c:14:2: error: stray ‘\235’ in program
mydel.c:19:4: error: stray ‘\342’ in program
printf(“%s %i deleted!\n”,d->d_name,i);
^
mydel.c:19:4: error: stray ‘\200’ in program
mydel.c:19:4: error: stray ‘\234’ in program
mydel.c:19:4: error: stray ‘\’ in program
mydel.c:19:4: error: stray ‘\342’ in program
mydel.c:19:4: error: stray ‘\200’ in program
mydel.c:19:4: error: stray ‘\235’ in program
mydel.c:21:2: error: stray ‘\342’ in program
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:21:2: error: stray ‘\200’ in program
mydel.c:21:2: error: stray ‘\234’ in program
mydel.c:21:2: error: stray ‘\342’ in program
mydel.c:21:2: error: stray ‘\200’ in program
mydel.c:21:2: error: stray ‘\235’ in program
mydel.c:21:2: error: stray ‘\342’ in program
mydel.c:21:2: error: stray ‘\200’ in program
mydel.c:21:2: error: stray ‘\234’ in program
mydel.c:21:2: error: stray ‘\342’ in program
mydel.c:21:2: error: stray ‘\200’ in program
mydel.c:21:2: error: stray ‘\235’ in program
mydel.c: In function ‘int main()’:
mydel.c:9:1: error: ‘DIR’ was not declared in this scope
DIR *dir;
^
mydel.c:9:6: error: ‘dir’ was not declared in this scope
DIR *dir;
^
mydel.c:13:15: error: ‘mydelexe’ was not declared in this scope
printf(“***mydelexe***\n”);
^
mydel.c:13:27: error: ‘n’ was not declared in this scope
printf(“***mydelexe***\n”);
^
mydel.c:13:31: error: ‘printf’ was not declared in this scope
printf(“***mydelexe***\n”);
^
mydel.c:14:19: error: ‘maildrop’ was not declared in this scope
dir = opendir(“maildrop”);
^
mydel.c:14:30: error: ‘opendir’ was not declared in this scope
dir = opendir(“maildrop”);
^
mydel.c:16:24: error: ‘readdir’ was not declared in this scope
while( d = readdir(dir) ) {
^
mydel.c:19:14: error: expected primary-expression before ‘%’ token
printf(“%s %i deleted!\n”,d->d_name,i);
^
mydel.c:19:15: error: ‘s’ was not declared in this scope
printf(“%s %i deleted!\n”,d->d_name,i);
^
mydel.c:19:35: error: invalid use of incomplete type ‘struct main()::dirent’
printf(“%s %i deleted!\n”,d->d_name,i);
^
mydel.c:8:8: error: forward declaration of ‘struct main()::dirent’
struct dirent *d;
^
mydel.c:21:18: error: expected primary-expression before ‘%’ token
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:21:19: error: ‘s’ was not declared in this scope
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:21:21: error: expected primary-expression before ‘%’ token
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:21:45: error: invalid use of incomplete type ‘struct main()::dirent’
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:8:8: error: forward declaration of ‘struct main()::dirent’
struct dirent *d;
^
mydel.c:21:53: error: ‘sprintf’ was not declared in this scope
sprintf(buf, “%s/%s”, “maildrop”, d->d_name);
^
mydel.c:22:12: error: ‘remove’ was not declared in this scope
remove(buf);
^
I have changed #include to #include <studio.h> etc. still not working
1) How do I delete the directory?
2) How do I make the code work?
Thanks
|
|
|
08-22-2015, 11:59 PM
|
#2
|
Member
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792
|
Quote:
Originally Posted by ramsforums
I have changed #include to #include <studio.h> etc. still not working
|
Did you mean <stdio.h> ?
You also need to include two additional headers (from man opendir or man readdir):
Code:
#include <sys/types.h>
#include <dirent.h>
The "error: stray ..." errors are caused by unicode characters copied from your browser. Notice the quotation marks are slanted (”)? Changing them to regular quotes (") should fix those errors.
HTH
|
|
|
08-23-2015, 07:07 PM
|
#3
|
Member
Registered: Jun 2007
Posts: 66
Original Poster
Rep:
|
Quote:
Originally Posted by norobro
Did you mean <stdio.h> ?
You also need to include two additional headers (from man opendir or man readdir):
Code:
#include <sys/types.h>
#include <dirent.h>
The "error: stray ..." errors are caused by unicode characters copied from your browser. Notice the quotation marks are slanted (”)? Changing them to regular quotes (") should fix those errors.
HTH
|
Thanks norobro. I could able to compile. I have modified the code as follows. When I ran the application it is getting executed but it is not deleting the files and directory. What could be wrong?
Code:
rama@develop ~ $ cat mydel.c
//COMPILE WITH: g++ -o mydelexe mydel.c
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main(int argc, char *argv[])
{
struct dirent *d;
DIR *dir;
char buf[256];
int i;
if (argc < 1)
{
printf("*** Usage: mydelexe [folder] *** \n\r");
return(1);
}
printf("***mydelexe %s ***\n\r", argv[1]);
dir = opendir(argv[1]);
while( d = readdir(dir) ) {
i++;
if(i%100==0) {
printf("%s %i deleted!\n\r",d->d_name,i);
}
sprintf(buf, "%s/%s \n\r", argv[1], d->d_name);
remove(buf);
}
return 0;
}
|
|
|
08-23-2015, 08:48 PM
|
#4
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,789
|
You call a function like remove() without checking the return value and errno, and then ask someone to tell you why it didn't work. Really?
Code:
if(remove(buf) < 0) perror(buf);
You'll need "#include <errno.h>" to let that compile.
For that matter, the return values from opendir() and readdir() should also be checked.
Last edited by rknichols; 08-23-2015 at 08:49 PM.
|
|
|
08-23-2015, 10:09 PM
|
#5
|
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912
|
The problem is the directory with a huge number of files.
The easiest way to deal with that is to let an "/usr/bin/rm -rf <directorypath>" run as long as it takes.(I suggest using a virtual terminal for this, you could do a "nohup /usr/bin/rm -rf <directorypath> >/tmp/nohup.out 2>&1 </dev/null &"
The problem is the way rm works and the interaction with certain filesystems. Those filesystems using a btree structure for the directory work the fastest, but most just use a linear list for the directory.
When a file gets deleted, the kernel has to copy the rest of the entries up one place in directory file. Then repeat for the next file... rm starts with the very first file - thus, the worst case delete.
You CAN make it run faster... but it depends on reading the directory list into memory, then reverse the order...
and delete each file. This is much faster because you avoid the kernel having to copy the remaining list of files.
Note: doing this with a million files takes a fairly large amount of memory. When I did it , I used perl. As I recall it went something like:
Code:
$some_dir = "directory with lots of files"; # this can be "." if you first do a cd to the directory...
opendir(my $dh, $some_dir) || die;
@list = reverse readdir($dh)
closedir($dh);
while (defined ($f = shift(@list)) ) {
next if (-d $f);
unlink $f || die "can't delete $f - $!\n";
}
Note - I have not tested this, it is from memory. How well it works depends on the filesystem. I'm not sure that it will help on btrfs (it uses a btree for the directory, so it should be fast anyway).
This particular bit of code will not delete directories (so . and .. will be left alone).
Last edited by jpollard; 08-23-2015 at 10:12 PM.
|
|
|
08-23-2015, 10:35 PM
|
#6
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,789
|
Quote:
Originally Posted by jpollard
When a file gets deleted, the kernel has to copy the rest of the entries up one place in directory file. Then repeat for the next file... rm starts with the very first file - thus, the worst case delete.
|
I don't know of any filesystem that works that way. Can you give an example of one? For all the filesystems I know with simple, unordered linear mapping, removing an entry leaves a hole and no moving of other entries is done. If you are scanning through the directory with readdir(), it would appear that the directory had been re-packed, but that's just because readdir() doesn't show you the empty spaces.
Really, starting with the last file would be the worst approach because when given a name, the kernel does have to perform a linear search from the beginning of the directory. If it's the last file of millions, that's going to take a while. Now, for a really primitive directory like the FAT variants (prior to EXFAT, at least), the unused entries have to be skipped over one-by-one, but for ext2/3/4 consecutive free areas are coalesced into a single block, so the skip just requires a single seek().
|
|
|
08-23-2015, 11:19 PM
|
#7
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,237
|
The rsync trick has worked for me - not multi-millions tho'.
Reading the entire list into memory is the problem with ls and rm taking so long. this seems the best approach.
|
|
|
08-24-2015, 12:40 AM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,237
|
Quote:
Originally Posted by ramsforums
My Laptop was reporting limited storage space. So I installed Bleachbit and executed with root privileges. I have enabled all options including WIPE. It ran overnight for 12 hours and application became unresponsive. So I have pressed the power button for 10 seconds it power down the laptop.
|
This is never a good idea - especially on a program that is potentially bypassing the filesystem and writing directly to disk.
Is the filesystem clean ?. Will fsck run - will it finish ?. How do you know you don't have recursive entries ?.
All the solutions offered presume a valid filesystem that just happens to have a boatload of file/directory entries. If the filesystem is innately broken, the only solution might be mkfs. And that of course means you have to somehow get your good data off first.
I suppose expecting a good, recent backup that you could simply restore from would be unrealistic ?.
|
|
1 members found this post helpful.
|
08-24-2015, 05:55 AM
|
#9
|
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912
|
Quote:
Originally Posted by rknichols
I don't know of any filesystem that works that way. Can you give an example of one? For all the filesystems I know with simple, unordered linear mapping, removing an entry leaves a hole and no moving of other entries is done. If you are scanning through the directory with readdir(), it would appear that the directory had been re-packed, but that's just because readdir() doesn't show you the empty spaces.
Really, starting with the last file would be the worst approach because when given a name, the kernel does have to perform a linear search from the beginning of the directory. If it's the last file of millions, that's going to take a while. Now, for a really primitive directory like the FAT variants (prior to EXFAT, at least), the unused entries have to be skipped over one-by-one, but for ext2/3/4 consecutive free areas are coalesced into a single block, so the skip just requires a single seek().
|
Nearly all of them. The problem is that a directory entry is always a variable length. Leaving holes in a directory causes searching problems in that you are constantly seeking the next entry. You have to search to find a hole big enough for a new entry (and the garbage handling for splitting/merging).
A directory search is far faster than trying to handle the garbage involved. This was why btree structures are being used to make it manageable. I did forget about hash tree directories, those are another this isn't helpful for. It shouldn't hurt though.
The only thing this method is trying to avoid is the inherent garbage collection for a directory. It cannot avoid the search - that will happen no matter what method is used.
Remember, the garbage is in memory, (and disk, though the disk doesn't get involved as much). The constant rearranging of the directory in memory is what is being avoided. Deleting the last entry has the least amount of rearranging happening. A btree list still has balancing actions, so the method is less helpful. This was why I also gave the simple approach - it will work in all situations, though for some, it is very slow. The reverse order delete is a bit more setup time (reversing the order), but will work everywhere but is most helpful on the linear directories.
|
|
|
08-24-2015, 09:59 AM
|
#10
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,789
|
Quote:
Originally Posted by jpollard
Nearly all of them.
|
That is not an example of one.
Quote:
The problem is that a directory entry is always a variable length. Leaving holes in a directory causes searching problems in that you are constantly seeking the next entry. You have to search to find a hole big enough for a new entry (and the garbage handling for splitting/merging).
|
Nonetheless, that's how it works, at least in ext2/3/4. Here are some dumps of a directory on an ext2 filesystem built without the "dir_index" option.
First, after "touch filename1 longerfilename2 file3"
Code:
00000000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 ................
00000010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 14 00 0A 02 ................
00000020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 0C 00 00 00 lost+found......
00000030 14 00 09 01 66 69 6C 65 6E 61 6D 65 31 00 00 00 ....filename1...
00000040 0D 00 00 00 18 00 0F 01 6C 6F 6E 67 65 72 66 69 ........longerfi
00000050 6C 65 6E 61 6D 65 32 00 0E 00 00 00 A8 03 05 01 lename2.........
00000060 66 69 6C 65 33 00 00 00 00 00 00 00 00 00 00 00 file3...........
Next, "rm filename1; touch longerfilename4"
Code:
00000000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 ................
00000010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 28 00 0A 02 ............(...
00000020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 00 00 00 00 lost+found......
00000030 14 00 09 01 66 69 6C 65 6E 61 6D 65 31 00 00 00 ....filename1...
00000040 0D 00 00 00 18 00 0F 01 6C 6F 6E 67 65 72 66 69 ........longerfi
00000050 6C 65 6E 61 6D 65 32 00 0E 00 00 00 10 00 05 01 lename2.........
00000060 66 69 6C 65 33 00 00 00 0C 00 00 00 98 03 0F 01 file3...........
00000070 6C 6F 6E 67 65 72 66 69 6C 65 6E 61 6D 65 34 00 longerfilename4.
Note that all that has happened to the "filename1" entry is that its inode number has been zeroed. The new "longerfilename4" would not fit in that space, so it is added at the end.
Now, "rm longerfilename2; touch longerfilename5"
Code:
0000000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 ................
00000010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 14 00 0A 02 ................
00000020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 0D 00 00 00 lost+found......
00000030 2C 00 0F 01 6C 6F 6E 67 65 72 66 69 6C 65 6E 61 ,...longerfilena
00000040 6D 65 35 00 18 00 0F 01 6C 6F 6E 67 65 72 66 69 me5.....longerfi
00000050 6C 65 6E 61 6D 65 32 00 0E 00 00 00 10 00 05 01 lename2.........
00000060 66 69 6C 65 33 00 00 00 0C 00 00 00 98 03 0F 01 file3...........
00000070 6C 6F 6E 67 65 72 66 69 6C 65 6E 61 6D 65 34 00 longerfilename4.
Note that the space for the two deleted entries of length 20 (0x14) and 18 (0x18) have been combined into a single entry of length 44 (0x2c). The characters of the "longerfilename2" entry still remain, but they are within that same 44-byte entry and follow the terminal NUL of the "longerfilename5" name, which was able to fit in that combined space. The subsequent entries in the directory remain in exactly the same locations as before.
|
|
|
08-24-2015, 01:54 PM
|
#11
|
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912
|
It is still having to do garbage collection to merge empty spaces, plus the I/O...
Most of what I tried to avoid is the in memory scrambling that goes on. Starting at the end makes it simple. All that is needed is to merge into one big block.
Last edited by jpollard; 08-24-2015 at 01:56 PM.
|
|
|
08-24-2015, 02:35 PM
|
#12
|
Moderator
Registered: Mar 2008
Posts: 22,110
|
I might boot to a live cd and check filesystem. (we'd like to know what it is by the way)
Then I'd just delete from the live boot.
|
|
|
08-24-2015, 02:36 PM
|
#13
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,789
|
Starting at the end means that for every name you pass in an unlink() system call the kernel has to start at the beginning of the directory and do a string compare on every one of the millions of entries until it gets to the matching name. If you start at the beginning, then the first name it tests will match. Tell me again which of those is faster.
And I found one place I was wrong earlier. The merging of successive deleted entries is done only when the kernel is trying to find space to add a new entry. If all you are doing is deleting, you get successive deleted entries that have to be skipped over individually. That's a matter of seeing that the inode number is zero and then reading the rec_len field from the next 16 bits -- painful, but still not as bad as doing a string compare on the name field.
As to why things are left in such an inefficient state, I suspect the answer is, "You should be using dir_index."
|
|
|
08-24-2015, 02:40 PM
|
#14
|
Senior Member
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,789
|
Quote:
Originally Posted by jefro
I might boot to a live cd and check filesystem. (we'd like to know what it is by the way)
Then I'd just delete from the live boot.
|
Sometimes it's best to just save everything you want to keep and then use mkfs to blow away the rest. Of course if you've got terabytes of data that you would have to save and restore, that's not a good solution either.
|
|
|
08-24-2015, 06:21 PM
|
#15
|
Member
Registered: Jul 2005
Location: Montreal Canada
Distribution: Fedora 31and Tumbleweed) Gnome versions
Posts: 311
Rep:
|
Quote:
Originally Posted by rknichols
Starting at the end means that for every name you pass in an unlink() system call the kernel has to start at the beginning of the directory and do a string compare on every one of the millions of entries until it gets to the matching name. If you start at the beginning, then the first name it tests will match. Tell me again which of those is faster.
And I found one place I was wrong earlier. The merging of successive deleted entries is done only when the kernel is trying to find space to add a new entry. If all you are doing is deleting, you get successive deleted entries that have to be skipped over individually. That's a matter of seeing that the inode number is zero and then reading the rec_len field from the next 16 bits -- painful, but still not as bad as doing a string compare on the name field.
As to why things are left in such an inefficient state, I suspect the answer is, "You should be using dir_index."
|
If the midpoint of the directory is not valid, I would guess that the system search algorithm would creep up to the next recognized entry and continue with some form of binary search (My BS).
|
|
|
All times are GMT -5. The time now is 12:05 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|