LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   c library: recursive delete of directories (https://www.linuxquestions.org/questions/programming-9/c-library-recursive-delete-of-directories-924872/)

hydraMax 01-20-2012 05:57 PM

c library: recursive delete of directories
 
Is there a C library out there (unix only is fine) designed to safely handle the recursive deletion of non-empty directories? I found several posts on the Web from people asking the same question, and they were all told that they would either have to make an external call to "rm -rf", or implement the whole idea from scratch (say, with nftw()), or use some hacked-together function some guy wrote in five minutes but is afraid to use himself.

But it just seems hard to believe that its been over forty years since Unix came along and there isn't some standard, safe, commonly accepted way of doing this. Is there something I've overlooked?

Dark_Helmet 01-20-2012 08:40 PM

I don't know that a recursive delete of empty directories is really something that needs to be implemented as a library call. My understanding of libraries is that they provide basic building blocks that a programmer then assembles to perform top-level specific tasks. What you describe is asking for a top-level specific task to be part of the library. In other words, I do not see a recursive-delete-of-empty-directories as a useful building block to some other task.

Be that as it may, there's no reason to rely on the code "some guy" wrote in five minutes. The code for rm and its recursive option is available for everyone and anyone to inspect and/or incorporate into their software. The rm command is part of the GNU Coreutils package. Download, extract, and have a little party! :)

hydraMax 01-20-2012 11:29 PM

Actually, I already have extracted the binutils code. The code for the rm program depends on remove.h and remove.c, which provide a rm() function and associated structures, but I do not know how easy it would be to cut and paste into my program: remove.c itself seems to be dependent on at least ten other source files from within the binutils package, most of which link to other headers and some of which are filled beginning to end with macro tests I've never even heard of. Many of the headers used do not even exist in the source tree, but are generated at the beginning of the make process.

No, I wouldn't be able to cut and paste from it, even if that was a good idea; somebody who knows how would have to turn the code into a separate library that I could link to.

Dark_Helmet 01-21-2012 12:05 AM

You do not necessarily have to copy-paste wholesale. My idea was to look at the code and incorporate the basic flow of the logic.

If that's a time-consuming process (more than you can afford), then perhaps a simple recursive function with calls to rmdir().

Off-hand, the system functions the algortithm might use:
opendir() (man 3 opendir)
readdir() (man 3 readdir)
stat() (man 2 stat)
rmdir() (man 2 rmdir)

I know you were hoping for some library that would have all this done for you, but I'm not aware of one. Though, obviously if someone does know of one and comes by this thread, I would be happy for them to correct me.

As a side note, you said you found rm in the binutils package. It was my impression that GNU Binutils was for manipulation/information of binary executables. I would hate to think that there's a binutils-rm and a coreutils-rm floating around out there.

hydraMax 01-21-2012 05:23 AM

Err... meant to type coreutils. (was looking at binutils earlier in the day so the name stuck in my mind...)

Well, since this thread went no where, I guess I'm back where I started. Either reinvent the wheel, or continue making external calls to "rm".

Maybe for my next C project, I'll try to port that Coreutils function into a separate library. Or maybe I'll join the mailing list and beg them to do it. (♪ All I have to do, is dre-e-e-e-eammm... ♫)

Nominal Animal 01-21-2012 01:02 PM

Actually, the situation is pretty interesting right now.

Linux kernels starting from version 2.6.16 provide syscall openat(fd,dirname,O_PATH|O_DIRECTORY) which can be used to open a directory descriptor to a subdirectory using a descriptor to the parent directory and only the subdirectory name, and unlinkat(fd,name,flags) where flags is 0 for normal files and AT_REMOVEDIR for empty directories. For current directory, you can use AT_FDCWD for fd .

Using the above, a very simple and robust algorithm will only use as many descriptors as the depth of the deepest subdirectory, but will be totally immune from hard link and rename issues. In particular, you can rename one of the directories being deleted, without the algorithm getting confused. It will only depend on current working directory for the very first unlink/opendir, and it will be perfectly thread-safe.

You might wish to use a loop to delete the contents of a directory recursively, until the directory itself can be removed. This in case there is somebody creating new files while you're trying to remove the tree. Because subdirectory deletion is not dependent on the current working directory, you could farm each subdirectory out to a separate thread from a thread pool, removing subdirectories in parallel.

To do the same in a portable manner, you need to do the tree removal in a child process, using fchdir() to descend into and go back up the tree, because the current working directory is common to all threads in the process. For the same reason, you cannot use more than one thread.

Finally, /bin/rm is a reliable workhorse for this. If you create a function which forks a child process and returns the child process pid, and in the child process, redirects standard input, output and error to /dev/null and calls execl("/bin/rm","rm","-rf",thing); you have an asynchronous tree deletion function done. The caller can go on doing something else productive, while the files are being removed. The caller can call a helper function, supplying the pid, to wait until the deletion is complete.

I wouldn't mind writing the removal as a simple library, but I just cannot decide which of the three above approaches makes most sense. I personally prefer the first, but it is Linux-specific, and will only work with kernels 2.6.16 and later.

hydraMax 01-21-2012 03:15 PM

Quote:

Originally Posted by Nominal Animal (Post 4580821)
Finally, /bin/rm is a reliable workhorse for this. If you create a function which forks a child process and returns the child process pid, and in the child process, redirects standard input, output and error to /dev/null and calls execl("/bin/rm","rm","-rf",thing); you have an asynchronous tree deletion function done. The caller can go on doing something else productive, while the files are being removed. The caller can call a helper function, supplying the pid, to wait until the deletion is complete.

This is the path I decided to take (already implemented). Writing my own recursive delete code sounds interesting as a project in and of itself, but I don't want to make it part of the current project. Thanks for the interesting information about openat and unlinkat sycalls, though.

If I implement my own simply library, or use somebody elses, it will have to meet my criteria for portability, and definitely not be Linux-only. Currently I am restricting myself to _XOPEN_SOURCE 700 features (POSIX.2, XPG4, SUSv4), and I want my code to compile (ideally) on any *nix system which meets those standards. Of course, there could be conditional preprocessor code for specific OSes.

hydraMax 02-02-2012 06:05 PM

Now that my other project has reached beta, I am intending to try and create a simple library that does recursive deletion, hopeful in a reliable and safe manner. However, does anyone have any additional insights and suggestions for me? I am to be honest not sure yet what approach to take. I'm thinking the AT functions, as nominal mentioned, look quite useful, though I am also wondering if this should be based around a file tree walk with ftw/nftw via postorder traversal.

I tried to look through the remove.c code in coreutils, but it seems rather complicated and is intended to do more than what I am aiming for here (e.g., interactive deletions).

hydraMax 02-10-2012 03:09 PM

I just wanted to mention (for anyone that might read this thread in the future) I was able to create a small library with a simple function for recursive deletion of a file hierarchy:

Recursive Remove Library


All times are GMT -5. The time now is 10:04 PM.