LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   listing files recursively, sorted by time, limited head (https://www.linuxquestions.org/questions/linux-newbie-8/listing-files-recursively-sorted-by-time-limited-head-870731/)

Harju 03-24-2011 10:36 AM

listing files recursively, sorted by time, limited head
 
Hi there!

I'm trying to make a shell script that will list the 50 newest files in a directory with several subdirectories in. I've been trying with the find-command with no luck and now I've figured I should probably use ls.

The problem is when I do "ls -lRt | head -50" it will do 1 directory at the time. It will not first make the full list and then sort it. This will display all items in first directory, sorted, then the newest directory will be sorted and displayed. So I figured I have to sort the whole process of ls before I limit the head. So this is where I am at now:

ls -lRt | sort <something clever here> | head -50

Only doing a "|sort|" will sort it by name if I understand it right and I don't know how to solve it. Here's also my first attempt if that is of any interrest or help, this was limited by the change status time of files (so some lists got very large). These lists dit not get sorted by time and I could not find any way to do so.

find $ftpDir -ctime $time -type f -print > $ftpFileLs

Any help on this would be appreciated since I'm sort of stuck now. After reading manuals for all the options I can think of and still there's just a big blur in my head..

colucix 03-24-2011 10:54 AM

You can use find with the -printf predicate to list the path of the files along with the last modification time. You can choose a format which is sorted both numerically and by date/time, then simply use sort, tail, cut and so on. Example:
Code:

find . -type f -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -k1n | tail -50 | cut -d' ' -f2-
test the pipe one step at a time and see if the results satisfy your requirement. Hope this helps.

EricTRA 03-24-2011 11:07 AM

Hello,

I recently had to come up with something very similar like that to put in a Troubleshooting Guide for a helpdesk of ours. I needed the 20 biggest files when searching through a directory recursively. This is what I came up with:
Code:

find . -type f | xargs ls -lSrh | tail -n 20 | awk -F" " '{ print $5,$9 }'
which gives me the 20 biggest files, sorted by size (-S for ls), no matter where they are descending down from the directory where you executed the command and with awk I only print the size and the path. Here's an example of the output in my case:
Code:

2.0M ./ISOPROD/VT/libvirt-0.6.3-33.el5.x86_64.rpm
2.4M ./ISOPROD/P2PCDN/extra/vlc-ffmpeg-0.5.2-2.el5.rf.x86_64.rpm
2.9M ./ISOPROD/P2PCDN/latest/contentpublisher-1.0-20.x86_64.rpm
3.1M ./ISOPROD/P2PCDN/extra/vlc-dirac-1.0.2-1.el5.rf.x86_64.rpm
3.8M ./ISOPROD/P2PCDN/latest/cdndns-1.0-12.x86_64.rpm
4.7M ./ISOPROD/P2PCDN/latest/libtracker_ch-1.0-4.x86_64.rpm
6.5M ./ISOPROD/Extra/python26-2.6.5-3.el5.x86_64.rpm
7.6M ./ISOPROD/P2PCDN/latest/topology-1.0-14.x86_64.rpm
7.6M ./ISOPROD/VT/etherboot-zroms-5.4.4-13.el5.x86_64.rpm
7.6M ./ISOPROD/VT/etherboot-roms-5.4.4-13.el5.x86_64.rpm
7.7M ./ISOPROD/isolinux/initrd.img
8.0M ./ISOPROD/P2PCDN/latest/entrypoint-1.0-23.5.x86_64.rpm
8.3M ./ISOPROD/P2PCDN/latest/livesplitter-1.0-10.x86_64.rpm
8.5M ./ISOPROD/P2PCDN/latest/pdnsmanager-1.0-13.x86_64.rpm
9.4M ./ISOPROD/P2PCDN/latest/cdnnode-1.0-74.x86_64.rpm
15M ./ISOPROD/P2PCDN/latest/tracker-1.0-30.x86_64.rpm
20M ./ISOPROD/Extra/jre-6u21-linux-x64-rpm.bin
26M ./ISOPROD/Cluster/luci-0.12.2-12.el5.x86_64.rpm
70M ./ISOPROD/Extra/WowzaMediaServer-2.2.2.noarch.rpm
80M ./ISOPROD/Extra/jdk-6u21-linux-x64.bin

I'm sure if you 'analyze' this command, you'll come up with exactly what you need. Hope it helps.

Kind regards,

Eric

Harju 03-24-2011 02:29 PM

Quote:

Originally Posted by EricTRA (Post 4301823)
Hello,

I recently had to come up with something very similar like that to put in a Troubleshooting Guide for a helpdesk of ours. I needed the 20 biggest files when searching through a directory recursively. This is what I came up with:
Code:

find . -type f | xargs ls -lSrh | tail -n 20 | awk -F" " '{ print $5,$9 }'
which gives me the 20 biggest files, sorted by size (-S for ls), no matter where they are descending down from the directory where you executed the command and with awk I only print the size and the path. Here's an example of the output in my case:
Code:

2.0M ./ISOPROD/VT/libvirt-0.6.3-33.el5.x86_64.rpm
2.4M ./ISOPROD/P2PCDN/extra/vlc-ffmpeg-0.5.2-2.el5.rf.x86_64.rpm
2.9M ./ISOPROD/P2PCDN/latest/contentpublisher-1.0-20.x86_64.rpm
3.1M ./ISOPROD/P2PCDN/extra/vlc-dirac-1.0.2-1.el5.rf.x86_64.rpm
3.8M ./ISOPROD/P2PCDN/latest/cdndns-1.0-12.x86_64.rpm
4.7M ./ISOPROD/P2PCDN/latest/libtracker_ch-1.0-4.x86_64.rpm
6.5M ./ISOPROD/Extra/python26-2.6.5-3.el5.x86_64.rpm
7.6M ./ISOPROD/P2PCDN/latest/topology-1.0-14.x86_64.rpm
7.6M ./ISOPROD/VT/etherboot-zroms-5.4.4-13.el5.x86_64.rpm
7.6M ./ISOPROD/VT/etherboot-roms-5.4.4-13.el5.x86_64.rpm
7.7M ./ISOPROD/isolinux/initrd.img
8.0M ./ISOPROD/P2PCDN/latest/entrypoint-1.0-23.5.x86_64.rpm
8.3M ./ISOPROD/P2PCDN/latest/livesplitter-1.0-10.x86_64.rpm
8.5M ./ISOPROD/P2PCDN/latest/pdnsmanager-1.0-13.x86_64.rpm
9.4M ./ISOPROD/P2PCDN/latest/cdnnode-1.0-74.x86_64.rpm
15M ./ISOPROD/P2PCDN/latest/tracker-1.0-30.x86_64.rpm
20M ./ISOPROD/Extra/jre-6u21-linux-x64-rpm.bin
26M ./ISOPROD/Cluster/luci-0.12.2-12.el5.x86_64.rpm
70M ./ISOPROD/Extra/WowzaMediaServer-2.2.2.noarch.rpm
80M ./ISOPROD/Extra/jdk-6u21-linux-x64.bin

I'm sure if you 'analyze' this command, you'll come up with exactly what you need. Hope it helps.

Kind regards,

Eric

Thank you so much, I tried yours first since it looked more familiar to me. The awk command gave a very neat print as well and I love it :) I also think I can use this to develop the print further by adding links to the lines at some later point.

I am also please that i get the ./ printout instead of the full path when using the "." in find command. Lots of new nice tricks to my very limited shellscriptingvault. Here's my final result by the way:

Code:

find . -type f | xargs ls -Rtlh | tail -n 50 | awk -F" " '{ print $6,$7,$9 }'

EricTRA 03-24-2011 02:42 PM

Hi,

Glad you got a solution! And pleased that you learned something from it. That's what Linux is all about. Sharing and learning from each other. I've learned from the command posted by colucix too, so take a look at his proposal if you have the time. The combination of these two commands gives you lot's of possibilities for future problems.

Forgot to mention. If you consider your question/problem solved then please mark it as such using the Thread Tools.

Kind regards,

Eric

Harju 03-24-2011 03:18 PM

Yeah now it's marked as solved, didn't know there was such an option :) . I ran into a new problem when I tried it on my SSH though. Since the guy who needs it uses characters like ' " $ or # in his filenames for some unknown reason. The xargs return error "unmatched single quote" but I think I'm onto a solution for this with grep -v or sed command to filter it out before the xargs. I'll return with the solution when I've managed.. I get "ls:*:No such file or directory" with the methods I've tried so far.

EricTRA 03-24-2011 03:21 PM

Hi,

Can you post a literal example including those characters so that we could have a look at it? Also, might be worth mentioning what distro the remote machine is running. Normally you can go a long way by 'escaping' the special characters but it would be handy to have a real life example.

Kind regards,

Eric

Harju 03-24-2011 04:28 PM

ok, so this is the line and output I'm working on now.
Code:

NAS:/media/# find . -type d | xargs ls -Rtlh | tail -n 50 | awk -F" " '{ print $6,$7,$9 }'
But while writing this message I've moved to the -exec option in find-command like this
Code:

find . -type f -exec ls -Rtlh {} \; | tail -n 50 | awk -F" " '{ print $6,$7,$9 }'
This is now working as intended (I think) but there's an error in the print. It will not print the full path or filenames. The print will break on the whitespace that comes in "$9". How can I make it print "$9 to newline"?

I have also tried to add -0 to xargs and "find . -print0" but this will give the error "too long command line" since it will place everything in one line (and there's a whole lot of files in these folders).

edit: btw It's some NAS from Netgear with redhat I think and for some reason the disks are on a windows-based system. it's a big mess really. I installed the SSH on it so I could manage this from home because his PC creeps me out. I am not even used to linux, I use OS X myself nowdays.
Linux version 2.6.17.14ReadyNAS (gcc version 3.3.5 (Infrant 3.3.5-1))

colucix 03-24-2011 05:20 PM

Quote:

Originally Posted by Harju (Post 4302228)
How can I make it print "$9 to newline"?

By means of
Code:

awk '{print substr($0,index($0,$9))}'
Quote:

Originally Posted by Harju (Post 4302228)
This is now working as intended (I think) but there's an error in the print. It will not print the full path or filenames. The print will break on the whitespace that comes in "$9". How can I make it print "$9 to newline"?

I'm not sure it's working as desired, since the syntax
Code:

-exec command {} \;
executes the command multiple times for every item found. Since you're looking for files with -type f, the -R option in the ls command is superfluous. Moreover the find command already performs recursion. Maybe you meant to use the -r option to reverse the order while sorting in conjunction with -t. Anyway the -t option is not necessary since the ls command is actually executed on a single file.

Said that, I would stick with the -printf predicate to print out the desired information, despite the presence of special characters in file names. If you need some clarification about its usage, feel free to ask.

Harju 03-24-2011 07:23 PM

Quote:

Originally Posted by colucix (Post 4302292)
I'm not sure it's working as desired, since the syntax
Code:

-exec command {} \;
executes the command multiple times for every item found. Since you're looking for files with -type f, the -R option in the ls command is superfluous. Moreover the find command already performs recursion. Maybe you meant to use the -r option to reverse the order while sorting in conjunction with -t. Anyway the -t option is not necessary since the ls command is actually executed on a single file.

Said that, I would stick with the -printf predicate to print out the desired information, despite the presence of special characters in file names. If you need some clarification about its usage, feel free to ask.


Yeah you'r right, I don't need the -R flag on my ls-command. Thanks, it seems to be working a lot faster now as well :D
I exec ls because I could not find a way to use the find-command and sort them by ctime which is what I want. This is my final piece of code and now it's all perfect except from the fact that it seems to run very slow. It's ok though since it will run at night on a server that doesn't do much else.

This is what it looks like now, I figured I could likewise include the 8:th column. It looks better and doesn't matter:
Code:

find . -type f -exec ls -rcl {} \; | tail -n 50 | awk -F" " '{print substr($0,index($0,""$6""))}'
Thank you all for all the help!

colucix 03-25-2011 04:51 AM

I'm sorry but I still not believe the files are sorted by ctime. The problem is that -exec executes the ls command on every single file, so that the -r option has no effect. Moreover according to the ls man page:
Code:

-c    with -lt: sort by, and show, ctime (time of last modification of file status information) with
      -l: show ctime and sort by name otherwise: sort by ctime

this means that in conjuction with -l the files are sorted by name and not by ctime.

Let me do a little example to demonstrate (based on modification time, but the result is the same):
Code:

$ touch -t 200809011230 file1
$ touch -t 200708151100 file2
$ touch -t 201001050045 file3
$ ls -l
-rw-r--r-- 1 colucix users 0 Sep  1  2008 file1
-rw-r--r-- 1 colucix users 0 Aug 15  2007 file2
-rw-r--r-- 1 colucix users 0 Jan  5  2010 file3

$ ls -lrt
-rw-r--r-- 1 colucix users 0 Aug 15  2007 file2
-rw-r--r-- 1 colucix users 0 Sep  1  2008 file1
-rw-r--r-- 1 colucix users 0 Jan  5  2010 file3

$ find . -type f -exec ls -lrt {} \;
-rw-r--r-- 1 colucix users 0 Sep  1  2008 ./file1
-rw-r--r-- 1 colucix users 0 Jan  5  2010 ./file3
-rw-r--r-- 1 colucix users 0 Aug 15  2007 ./file2

As you can see, whereas the ls -lrt command alone sorts the files by mtime in reverse order, the find command executing the same ls -lrt does not sort them. Actually it executes the following three different commands:
Code:

$ ls -lrt ./file1
$ ls -lrt ./file3
$ ls -lrt ./file2

Instead, using -printf I can print out the timestamp in a format suitable for numerical sorting and let the sort command do the trick:
Code:

$ find . -type f -printf "%TY%Tm%Td%TH%TM%TS %p\n"
20080901123000 ./file1
20100105004500 ./file3
20070815110000 ./file2

$ find . -type f -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -n
20070815110000 ./file2
20080901123000 ./file1
20100105004500 ./file3

You can do the same using the change time:
Code:

$ find . -type f -printf "%CY%Cm%Cd%CH%CM%CS %p\n" | sort -n
you can choose any format for the timestamp, but to be sorted numerically it must have Year, Month, Day, Hour, Minutes and Seconds in this order. Example:
Code:

$ find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS %p\n" | sort -n
Taking advantage of my privileges of LQ mod, I unmark this thread as SOLVED, so that someone else may come in and shed some more light. If you don't mind, of course! :)

Harju 03-25-2011 10:45 AM

Quote:

Originally Posted by colucix (Post 4302714)
I'm sorry but I still not believe the files are sorted by ctime. The problem is that -exec executes the ls command on every single file, so that the -r option has no effect. Moreover according to the ls man page:
Code:

-c    with -lt: sort by, and show, ctime (time of last modification of file status information) with
      -l: show ctime and sort by name otherwise: sort by ctime

this means that in conjuction with -l the files are sorted by name and not by ctime.

Let me do a little example to demonstrate (based on modification time, but the result is the same):
Code:

$ touch -t 200809011230 file1
$ touch -t 200708151100 file2
$ touch -t 201001050045 file3
$ ls -l
-rw-r--r-- 1 colucix users 0 Sep  1  2008 file1
-rw-r--r-- 1 colucix users 0 Aug 15  2007 file2
-rw-r--r-- 1 colucix users 0 Jan  5  2010 file3

$ ls -lrt
-rw-r--r-- 1 colucix users 0 Aug 15  2007 file2
-rw-r--r-- 1 colucix users 0 Sep  1  2008 file1
-rw-r--r-- 1 colucix users 0 Jan  5  2010 file3

$ find . -type f -exec ls -lrt {} \;
-rw-r--r-- 1 colucix users 0 Sep  1  2008 ./file1
-rw-r--r-- 1 colucix users 0 Jan  5  2010 ./file3
-rw-r--r-- 1 colucix users 0 Aug 15  2007 ./file2

As you can see, whereas the ls -lrt command alone sorts the files by mtime in reverse order, the find command executing the same ls -lrt does not sort them. Actually it executes the following three different commands:
Code:

$ ls -lrt ./file1
$ ls -lrt ./file3
$ ls -lrt ./file2

Instead, using -printf I can print out the timestamp in a format suitable for numerical sorting and let the sort command do the trick:
Code:

$ find . -type f -printf "%TY%Tm%Td%TH%TM%TS %p\n"
20080901123000 ./file1
20100105004500 ./file3
20070815110000 ./file2

$ find . -type f -printf "%TY%Tm%Td%TH%TM%TS %p\n" | sort -n
20070815110000 ./file2
20080901123000 ./file1
20100105004500 ./file3

You can do the same using the change time:
Code:

$ find . -type f -printf "%CY%Cm%Cd%CH%CM%CS %p\n" | sort -n
you can choose any format for the timestamp, but to be sorted numerically it must have Year, Month, Day, Hour, Minutes and Seconds in this order. Example:
Code:

$ find . -type f -printf "%CY-%Cm-%Cd %CH:%CM:%CS %p\n" | sort -n
Taking advantage of my privileges of LQ mod, I unmark this thread as SOLVED, so that someone else may come in and shed some more light. If you don't mind, of course! :)

You'r right and the -printf option works a lot better. The reason why I first didn't try it is that the option does not exist on my OSX system (snow leopard 10.6.6, using macports package handler). It's not even in manual for find.

I tried these lines on the system that it's intended for and it seems to work just right. The "exec ls" method was not functioning properly while also taking a lot of time to complete. I will complete my script and test it properly before I mark the thread as solved this time.

Once again, thanks a lot! :)

colucix 03-25-2011 10:51 AM

You're welcome! :)


All times are GMT -5. The time now is 04:27 AM.