LinuxQuestions.org - [SOLVED] Need help with script that finds the 10 largest files

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Need help with script that finds the 10 largest files (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-script-that-finds-the-10-largest-files-4175455595/)

amber1

03-25-2013 10:09 PM

Need help with script that finds the 10 largest files

How can I write a script that finds 10 largest files in a filesystem? The script has to display the filenames and sizes in reverse order, largest first. I would appreciate if someone can please help.
Thanks

nicksu

03-25-2013 10:37 PM

Quote:

Originally Posted by amber1 (Post 4918879)

can this work ?
du -ha | sort -hr | head -n 10

shivaa

03-25-2013 11:19 PM

You will need following commands: find, du, sort, head

Well, it's not a good idea to offer you a ready-made script. Instead you can create it yourself with help of following cmds:

Code:

find / -exec du -sk {} \; 2>/dev/null| gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

nicksu

03-26-2013 12:26 AM

Quote:

Originally Posted by shivaa (Post 4918902)

You will need following commands: find, du, sort, head

Well, it's not a good idea to offer you a ready-made script. Instead you can create it yourself with help of following cmds:

Code:

find / -exec du -sk {} \; | gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

Hi,but your find would add the current path into the files.txt,can you show how to avoid it ?use find
I only can use grep as grep -v ".$" to avoid the . into the files.txt

shivaa

03-26-2013 12:43 AM

@nicksu:
Searching for files in / directory will print absolute path i.e. complete path of files in /tmp/files.txt, which is the only way to search files in whole file system.
Please explain your problem little more, what exactly you want to say? Else, following script is running fine:

Code:

#!/bin/bash

find / -exec du -sk {} \; 2>/dev/null| gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

@amber1:
Run the script as root user or use sudo, because you will search for files in / directory.

nicksu

03-26-2013 12:54 AM

Quote:

Originally Posted by shivaa (Post 4918938)

Code:

#!/bin/bash

find / -exec du -sk {} \; 2>/dev/null| gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

@amber1:
Run the script as root user or use sudo, because you will search for files in / directory.

oh,my mistake.I use the . instead of the / in find command

nicksu

03-26-2013 01:54 AM

Quote:

Originally Posted by amber1 (Post 4918879)

Hi,thanks to Shivva's help,I formed the script as below,hope it can help

#! /bin/bash
echo "pleae type in the path in which you want to check the most largest file"
read path
a=`find $path -type f -exec du -h {} \;|sort -hr|head -n 10|awk '{print $2}'`
for i in $a
do
echo $(ls -hld $i) 2>/dev/null
done

shivaa

03-26-2013 04:30 AM

@nicksu:
Once go through this guide. And note that,
1. There's no need to use a for loop, -exec option will do that.
2. Do not use "head -10" with find command, because it will then print first 10 files, not all files.

Code:

#! /bin/bash

echo "pleae type in the path in which you want to check the most largest file"; read path

#!/bin/bash

find "$path" -exec du -sk {} \; 2>/dev/null| gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

nicksu

03-26-2013 05:10 AM

Quote:

Originally Posted by shivaa (Post 4919034)

Code:

#! /bin/bash

echo "pleae type in the path in which you want to check the most largest file"; read path

#!/bin/bash

find "$path" -exec du -sk {} \; 2>/dev/null| gawk '{gsub(/K/,"",$1); print $0}' > /tmp/files.txt

sort -nr -k1 /tmp/files.txt | head -10

\rm /tmp/files.txt

wow,what a guide,thank you for your share.
and for your point 2,I am not so clear.I use the "head -10" to print out the first 10 lines,because I have issued the "sort -hr" which sort the find result and then fetch the first 10 line by "head -10".Why wrong ?

colucix

03-26-2013 05:28 AM

Please use a descriptive title for your thread excluding words like 'urgent' or 'help'. Using a proper title makes it easier for members to help you. This thread has been reported for title modification. Please do not add replies that address the thread title.

shivaa

03-26-2013 07:14 AM

Quote:

@nicksu: ...and for your point 2,I am not so clear.I use the "head -10" to print out the first 10 lines,because I have issued the "sort -hr" which sort the find result and then fetch the first 10 line by "head -10".Why wrong ?

It's not wrong, but sorting directly on find command result isn't good, plus you've not specified any field no. i.e. k1, which you should.
Second thing, making your script unnecessarily lengthy is not a good idea when your work can be done in a few commands, so for loop is also not needed.

However, please refer the guide I mentioned, so your doubts can be cleared.

pan64

03-26-2013 08:00 AM

I do not like those chain of commands like find, du, grep, sed, awk, head, cut ... (not to speak about additional loops), keep it simple:

Code:

find . -type f -exec du -sb {} \; | perl -e ' my %h; while (<>) { @b = split; $h{$b[1]}=$b[0] } $i=2; for my $k (sort { $h{$b} <=> $h{$a} } keys %h) { print "$k $h{$k}\n"; last if $i++>3; } '



# the last 3 (after $i++>) means print 3 lines, so you need to modify it....

# also the last print can be replaced to give formatted output:



printf("%-30s  %10d\n", $k, $h{$k})

rigor

03-28-2013 12:35 PM

amber1,

The commands du and find both go through all files and directories beneath the starting point you give them, and so will cross file system boundaries, unless you tell them not to do so. That is, they will not necessarily be limited to a single file system. If the file system on which you start, has other file systems mounted within it, the result would be the largest file on any of those file systems, not the largest file on the one file system. Also, du tends to attribute all space used by all files and directories under a directory, to the directory itself. So it would show the root of any directory tree as owning all the space used below it, falsely showing the root directory as the largest file. Although the -exec option of the find command is a powerful capability to extend the usefulness of the find command, repeatedly having find execute another command, such as du, when it isn't necessary, can be rather slow. A command sequence such as the following, limits find to a single file system, making use of find's own capabilities to produce its output, only then sorting the result, and grabbing the first ten lines.

Code:

find / -xdev -printf "%s %p\n" | sort -rn | head -10

You indicated you wanted the 10 largest files. If you meant that in the sense of simple files, not directories, so you need to exclude directories from consideration, you could do the following.

Code:

find / -xdev -type f -printf "%s %p\n" | sort -rn | head -10

All times are GMT -5. The time now is 07:59 PM.