[SOLVED] testing if a directory is empty

Skaperen · 03-06-2020, 09:11 PM

i would like to test if a directory is empty. the obviously simple way is to read the list of names in the directory, skipping . and .. if they are there. but this forces another block to be read. i'd like to know if this can be determined from the inode much like looking at the link count can tell you if it has any subdirectories (you can skip it if only listing girectories). can this be done from stat() data?

the purpose of this is to aquire all the names in a ditrctory tree sf fasy as possible (the fewest titol I/O operations).

rknichols · 03-06-2020, 09:41 PM

The directory's inode can't tell you much. The minimum space allocated to a directory is 4096 bytes (1 block), and that will be the same whether the directory is empty or has a few files in it. Also, a directory can expand, but does not automatically shrink when files are removed (there's an option in fsck to accomplish that), so a large size for a directory just means that at one time it contained many files, whereas it might now be empty.

The find command does have a "-empty" test that will return "true" for an empty file or directory. Whether that is considered a "simple way" depends on the individual and the situation.

syg00 · 03-06-2020, 09:58 PM

The days of being able to predict if you can save (real) I/O have long gone. There is so much caching going on you can't even presume to be able to replicate test results.
You are probably worrying about the wrong thing in the overall scheme of things.

Skaperen · 03-07-2020, 12:23 AM

the purpose is simply to increase efficiency in a file scan generator, to avoid trying to read the list of names if there are none. it appears that the filesytem code or kernel reads at least one empty 4k block from the directory when trying to read names. that is probably good evidence that there is no way to determine that, at least for filesystems i have tried (ext2,ext3,ext4,btrfs,reiserfs). it's not a critical need. i can just go ahead and read the list of names and see if it is empty, or just not deal with being empty.

this project is a generator in python3 that yields each path in name sorted order with the file type (regular file vs directory, etc) included in the yielded tuple.

syg00 · 03-07-2020, 02:51 AM

How much I/O (and time) is consumed starting python ?. I think you need to get things in perspective.

BW-userx · 03-07-2020, 10:54 AM

the only way to see if a something is in something else without asking someone else is to look for yourself. this is a basic truth applied in all areas of life.

rknichols · 03-07-2020, 01:00 PM

Quote:

Originally Posted by Skaperen

it appears that the filesytem code or kernel reads at least one empty 4k block from the directory when trying to read names.

It is never empty. At a minimum, it contains the entries for "." and ".." .

dugan · 03-07-2020, 02:19 PM

Quote:

Originally Posted by Skaperen

i would like to test if a directory is empty. the obviously simple way is to read the list of names in the directory, skipping . and .. if they are there. but this forces another block to be read. i'd like to know if this can be determined from the inode much like looking at the link count can tell you if it has any subdirectories (you can skip it if only listing girectories). can this be done from stat() data?

the purpose of this is to aquire all the names in a ditrctory tree sf fasy as possible (the fewest titol I/O operations).

I got to the last sentence, and, I, uh...

If the point is to acquire all the names in a directory tree, then the fastest and most efficient way is to query for all the names in the directory directly. Adding a guard to check for the special case where the directory is empty is just going to waste time. The check, no matter how efficient, is not free.

And the information you want wouldn't be in the inode. It would in the directory entry ("dirent"). It looks to me like the structure that provides access the directory entry's children is intentionally not part of the public API, and you have to call readdir to get them. So the fastest way to check if directory is empty is indeed to list it.

MadeInGermany · 03-07-2020, 05:58 PM

Trace a find -empty
I guess it does a readdir().

The . and .. links are present in a Unix-like filesystem or if the kernel driver presents it Unix-style.
Assuming this is always true, you can see if it has sub directories (links > 2) or not (links = 2). But for seeing files you need readdir().

GazL · 03-08-2020, 10:06 AM

Example program (I was curious):

Code:

#include<stdio.h>
#include<sys/types.h>
#include<dirent.h>

int main( int argc, char *argv[] )
{
    int count = -2 ;
    struct dirent *dent;

    DIR *d;

    d = opendir(argv[1]);
    while ( dent = readdir(d) )
        count++;
    
    if ( count > 0 )
        printf("Count %d\n", count);
    else
        puts("empty");
    
    return 0;
}

(please excuse the lack of error/argument checking, I couldn't be bothered).

strace of ./a.out /var/empty:

Code:

openat(AT_FDCWD, "/var/empty", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
brk(NULL)                               = 0x215a000
brk(0x217b000)                          = 0x217b000
getdents64(3, /* 2 entries */, 32768)   = 48
getdents64(3, /* 0 entries */, 32768)   = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
write(1, "empty\n", 6)                  = 6

strace of ./a.out /somewhere_that's_not empty:

Code:

openat(AT_FDCWD, "/var/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
brk(NULL)                               = 0xa68000
brk(0xa89000)                           = 0xa89000
getdents64(3, /* 22 entries */, 32768)  = 712
getdents64(3, /* 0 entries */, 32768)   = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}) = 0
write(1, "Count 20\n", 9)               = 9

Conclusion:

The glibc implementation of readdir(3) uses getdents64(2) internally with a buffer size of 32768. What this essentially means is that any number of readdir(3) calls that don't exceed that buffer size will not result in any additional I/O operations or additional context switches (due to syscalls).

As shown above, there are only two getdents64() calls in both cases. It looks like you always get one additional getdents64() call when trying to read past the last directory entry with readdir(3).

Even if all the file's names are approaching NAME_MAX you'd still need over a hundred of them to exceed this buffer and result in additional I/O OPs: assuming VFS cache hasn't already cached them of course, which it probably has.

So, as others have said, not worth worrying about.

petelq · 03-08-2020, 05:33 PM

Maybe you can work with something like

Code:

if [ $(ls -A)=0 ]; then echo " empty"
else
echo "files"
fi

You could, perhaps, build in a directory variable as a parameter but the bottom line is, I think dugan is right in his post above.

MadeInGermany · 03-09-2020, 02:42 AM

Quote:

Originally Posted by petelq

Maybe you can work with something like

Code:

if [ $(ls -A)=0 ]; then echo " empty"
else
echo "files"
fi

You could, perhaps, build in a directory variable as a parameter but the bottom line is, I think dugan is right in his post above.

That needs a small correction

Code:

if [ -z "$(ls -A)" ]; then

ondoho · 03-11-2020, 02:11 AM

^ what about https://mywiki.wooledge.org/ParsingLs ?
Maybe something like

Code:

for i in * .*; ....

would be better?

Or, how about

Code:

stat .

It offers some information that seems to hint to a directory being filled with stuff, or empty, like 'Size' or 'Blocks' or 'Links'.

rnturn · 03-11-2020, 09:25 AM

Quote:

Originally Posted by syg00

How much I/O (and time) is consumed starting python ?.

~15ms on my older (G3440-based) system. Looking at an empty subdirectory can't take very long.

IMHO, if directory scanning is taking too long, we need to find an easy way to detect and skip browser cache and thumbnail directories. :^)

petelq · 03-11-2020, 12:41 PM

Quote:

Originally Posted by MadeInGermany

That needs a small correction

Code:

if [ -z "$(ls -A)" ]; then

I did a brief test with "$(ls -A)=0" before my original post and it worked. But your way's good also.