I'm drawing a blank: shell "one-liner" to find "n" largest files in a directory tree

sundialsvcs · 04-12-2022, 09:48 AM

Oh, I could figure this out for myself, but maybe it's faster to just ask ...

I need a "shell one-liner" that will tell me, say, the 10 largest files anywhere in a directory or any of its subdirectories. (The 10 largest files with their locations, no matter where they are.)

teckk · 04-12-2022, 09:54 AM

Something like this, or a variant.

Code:

find . -type f -print0 | xargs -0 du | sort -n | tail -20

Edit:
Or jam as many pipes in there that you wish

Code:

find . -type f -print0 | xargs -0 du | sort -n | tail -20 | cut -f2 | xargs -I{} du -sh {}

boughtonp · 04-12-2022, 10:03 AM

Don't need xargs, but do need --all to prevent du grouping files into directories?

Code:

du --all --files0-from=<(find ./directory -type f -print0) | sort -nr | head -n10

teckk · 04-12-2022, 10:22 AM

Oh ok, you can get rid of the pipes too if one wanted.

Code:

head -n10 <<< $(sort -nr <<< $(du --all --files0-from=<(find . -type f -print0)))

teckk · 04-12-2022, 10:28 AM

And to answer the OP ,you would search the whole file tree. You'll need to be root for some of it.

Code:

head -n10 <<< $(sort -nr <<< $(du --all --files0-from=<(find / -type f -print0)))

Turbocapitalist · 04-12-2022, 10:44 AM

I'd add that sort has a -h option which can work with the output from du and it's own -h option.

pan64 · 04-12-2022, 01:25 PM

Quote:

Originally Posted by teckk

Oh ok, you can get rid of the pipes too if one wanted.

Code:

head -n10 <<< $(sort -nr <<< $(du --all --files0-from=<(find . -type f -print0)))

No, that is not true. As it was asked in LQ, this is exactly the same, just with a quite different syntax.
If you really want to avoid pipes you can do something like this (using only one pipe):

Code:

du <arguments> | awk/perl/python 'collect data/sort/print first 20 lines'

To completely avoid pipes you need to implement the directory scan within that python/perl script. It's not that complicated. But those are not one-liners, although you can put them into a script and use as a single command.

!!! · 04-12-2022, 05:01 PM

https://askubuntu.com/questions/1292...es-in-a-folder

boughtonp · 04-12-2022, 05:20 PM

On that note, GNU Awk has interesting array sorting...

Code:

cat results-of-du.txt | awk '{a[$2]=$0}END{PROCINFO["sorted_in"]="@val_num_desc";x=0;for (i in a){print(a[i]);if (++x>=10)exit;}}'

(Of course there's not really a one-liner - sure there's no newlines but if a command is longer than 80 characters it's probably disqualified?)

Anyhow, same thing more readably...

Code:

cat results-of-du.txt | awk '''{a[$0]=$1}
END {
 PROCINFO["sorted_in"] = "@val_num_desc";
 x=0;
 for ( i in a )
 {
  print(i);
  if (++x>=10)
   exit;
 }
}'''

i.e The value of global variable PROCINFO["sorted_in"] defines the order of loop iteration - no need for an explicit sort call - we store sizes in the value, then automatically sort numeric descending on that value, then printing the indexes (which are size+filename; could use "a[$2]=$1" to omit the size).

pan64 · 04-13-2022, 12:28 AM

Quote:

Originally Posted by boughtonp

On that note, GNU Awk has interesting array sorting...

Code:

cat results-of-du.txt | awk '{a[$2]=$0}END{PROCINFO["sorted_in"]="@val_num_desc";x=0;for (i in a){print(a[i]);if (++x>=10)exit;}}'

UUoC, you only need:

Code:

f='{a[$2]=$0}END{PROCINFO["sorted_in"]="@val_num_desc";x=0;for (i in a){print(a[i]);if (++x>=10)exit;}}'
awk "$f" results-of-du.txt

boughtonp · 04-13-2022, 08:00 AM

Quote:

Originally Posted by pan64

UUoC, you only need:

Yes, I know!

I was maintaining the structure of post #7 whilst providing a runnable command (and deliberately not using newlines).

sundialsvcs · 04-13-2022, 09:55 AM

Thanks, folks. (I didn't mean to create a "Name That Tune" competition!)