[SOLVED] Looking for the last five files created on the hard disk.

stf92 · 08-04-2010, 11:36 PM

Kernel 2.6.21.5, GNU (Slackware 12.0).
Bash 3.1.17.

Hi:
I want to search an entire subtree of /, in the file system, for all files, with extension html, created on the hard disk. In addition, these have to be the last five created.

I think I could split the problem into two parts: (a) Forget about the last condition. Then this is a job for the find command. (b) Sort the output of find using the date as the key, then use 'head' to print the desired output.

But even two such simple steps are enough to justify the writing of a shell script. And here lies my weakness. My script writing knowledge is rudimentary.

What's the final purpose? Well, I lately saved four or five LQ pages onto disk containing information I consider valuable to me. But I don't exactly remember where on the disk. So...

Then: either the problem posed is really of a very simple nature or it is not, in the latter case a script being mandatory. Any suggestion will be welcome. Thank you for reading.

EDIT: one of the algorithm drawbacks (the one described above) is that find may be running a great deal of time. My machine resources (RAM and CPU speed are low) are scarce and there possible are a large number of HTML files on the disk.

sag47 · 08-05-2010, 12:17 AM

Try this command (I'm away from my linux machine until tomorrow so this is off my head). I'll be able to give you a more correct command tomorrow if this one is wrong...

Code:

find / -type f -name *.htm* -print0 | xargs -0 ls -t | head

better to be run as root with su or sudo command.

SAM

stf92 · 08-05-2010, 12:37 AM

Hi:
and thanks. It began outputting file names until it was so interrupted:
xargs: ls: terminated by signal 13.

I see the option 't' given to 'ls' is a key piece of the command. Unfortunately these things of signals, are yet beyond the scope of my knowledge (linux/unix). Regards.

EDIT: I ran it as root.

Guttorm · 08-05-2010, 02:54 AM

Hi

Try this:

find / -name "*.html" -type f -printf "%C+ %P\n" |sort |tail -n 5

stf92 · 08-05-2010, 03:16 AM

Hi:
and welcome. I tried it and it worked fine. Although it won't show all the usefulness it is capable of until I understand what is the argument of find's option 'name'. Is it a regexp? The manual does not say. So it is left for the shell alone to expand the argument. Or perhaps both things happen, one after the other. It's always been a mistery to me. Thanks and regards.

colucix · 08-05-2010, 07:07 AM

Actually it is not a regexp, nor it is expanded by the shell. The double quotes protect the asterisk from the shell, so that it is passed literally to the find command. Find interprets it in a way similar to the shell, matching any name terminated by the suffix .html.

Instead in a regexp the asterisk means any number of repetitions (0, 1 or more) of the previous expression, but in this case there isn't any previous expression.

On the contrary, if you let the shell expand it (without quotes or without escaping with backslash), the expression will be substituted by the name of the html files (if any) in the current working dir. If there were no html files, the asterisk would have been passed literally. If there were only one file, the find command would have searched only files with that specific name. For two or more it could result in a syntax error.

sag47 · 08-05-2010, 11:27 AM

I know this is already solved but since you were specifically looking for linuxquestions.org files here is a nice search string which will show you all files that contain one or more instances of the word you are looking for. I'll break the command down for you since you seem to be new to Unix/Linux shell. In the future just replace "linuxquestions" with the search term you want. If you want to search more than just html files then you have to modify *.htm* next to iname.

Code:

find -iname "*.htm*" -type f -print0 | xargs -0 grep -iH "linuxquestions" | cut -d: -f1 | sort -u

The pipe (|) character is used to pipe the output of one command as input to another command (at the end of the command). You can formulate a command with more than one pipe. Just remember the Unix tool philosophy. Make a tool that does one thing, and one thing well (then with multiple commands pipe them all together to do what you want).

find -iname "*.htm*" -type f -print0 (for more information use "man find" in terminal)

Find finds files based on arguments you give it. If you give it no arguments then it outputs every file from the current directory and all sub directories (including hidden files).
-iname filters files found to files only matching the pattern which follows it. In this case all .htm and .html files ("*.htm*")
-type f shows only files (not directories)
-print0 uses null characters to separate the results instead of new lines. This way processed output can contain spaces and other special characters which will be handled by xargs.

xargs -0 grep -iH "linuxquestions" (for more information use "man xargs" and "man grep" in terminal)

xargs processes input for use as input for another command.
-0 separates breaks in input with a null space character instead of a new line. This way spaces, new lines, and other special characters can be used as input.
grep OPTIONS PATTERN FILE processes the input given by xargs (in this case a file path) so grep will process the contents of the given file.
grep -iH allows the search PATTERN to be case insensitive. The -H option tells grep to print the file path of the file which the search pattern is found. The separator between the file name and the contents of the line where the pattern is found is a colon ( : ).

cut -d: -f1 (for more information use "man cut" in terminal)

cut is for splitting up a string or line of output. Similar to str.split() in other programming languages.
-d: tells cut to use a colon ( : ) as a delimiter and split the contents.
-f1 tells cut to only show the first field of the split. In our case it is just the file name and not the contents found by grep since that is what is useful in our case.

sort -u (for more information use "man sort" in terminal)

sort does what it says. It sorts a list alphabetically by default.
-u means sort as a unique list. This way when there's a hundred instances of our search term found in a file we just see a single file name.

sag47 · 08-05-2010, 07:18 PM

Quote:

Originally Posted by stf92

Hi:
and welcome. I tried it and it worked fine. Although it won't show all the usefulness it is capable of until I understand what is the argument of find's option 'name'. Is it a regexp? The manual does not say. So it is left for the shell alone to expand the argument. Or perhaps both things happen, one after the other. It's always been a mistery to me. Thanks and regards.

Yes, it uses regular expressions which are similar to perl. Not exact though. Refer to this link...
http://www.grymoire.com/Unix/Regular.html

Here's an example testing if a line starts with an ip address.

Code:

#IP shows up in output because the line starts with it
echo "192.168.1.1" | grep "^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}.[0-9]\{1,3\}"

#IP does not show up because the line doesn't start with it
echo "hello 192.168.1.1" | grep "^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}.[0-9]\{1,3\}"

So if you know your way around perl regex then you should be right at home.

Here's two ways of using the find command to search for case insensitive Windows (or windows or WINDOWS)...

Code:

#case insensitive argument in the find command
find -iname "*windows*"

#using regex to define case insensitivity.
find -name "*[Ww][Ii][Nn][Dd][Oo][Ww][Ss]*"

stf92 · 08-05-2010, 10:35 PM

Thanking everybody for their useful information and the explanation about the argument 'name' of 'find'. I can't leave the thread without saying this. Trondheim was mentioned in Harold Foster's Prince Valiant and you, colucix, shoudn't put Italy after the name of Bologna, whose university once was illustrious in Europe.