LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-05-2006, 04:54 AM   #1
simba_cubs
LQ Newbie
 
Registered: Jul 2006
Posts: 10

Rep: Reputation: 0
Help! Need to find certain files from 47,000


Hi,

I need to find all files between 08:00 - 12:00 yesterday morning.
The files are emails and the structure of the directories in is date format, so each time a new directory is created and named 20061004 for example.

In the 20061004 directory there are 47,000 files. I need to extract all files that contain a user's name from those 47,000.

I've tried the following...

grep "tom.thumb" *

This returns a "bash: /bin/grep: Argument list too long" error.

I then tried

find * -newer 8amoct4_06 ! 11_59amoct4_06 -print

That returned the following "bash: /usr/bin/find: Argument list too long" error.

Could someone tell me where I'm going wrong please.

Many thanks
 
Old 10-05-2006, 06:06 AM   #2
budword
Member
 
Registered: Apr 2003
Location: Wisconsin
Distribution: Switched to regualr Ubuntu, because I don't like KDE4, at all. Looks like vista on crack.....
Posts: 675

Rep: Reputation: 31
I might be wrong, but I think the file command just finds files by name, if you are looking for a certain string(a persons name) inside that file find won't help.

Check out the bottom of this page. http://www.computerhope.com/unix/ugrep.htm

Looks like the following command might work
grep -ir tom.thumb .

Let me know if that helps....

Best of luck

David
 
Old 10-05-2006, 06:20 AM   #3
kstan
Member
 
Registered: Sep 2004
Location: Malaysia, Johor
Distribution: Dual boot MacOS X/Ubuntu 9.10
Posts: 851

Rep: Reputation: 31
Have u try something like this?
$grep -n "tom.thumb" .
to get all file name?

Last edited by kstan; 10-05-2006 at 06:26 AM.
 
Old 10-05-2006, 06:40 AM   #4
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Rep: Reputation: 282Reputation: 282Reputation: 282
The problem with grep is the the dot is interpreted as a special character.
Code:
grep "tom\.thumb" *
 
Old 10-05-2006, 07:36 AM   #5
olaola
Member
 
Registered: Aug 2006
Location: Italy
Distribution: Fedora
Posts: 41

Rep: Reputation: 15
The problem is the number of files you are exploring (Argument list too long).
Using the "*" you are passing to the command (grep or find or something else) a list o files. When this list is too long you get an error.

Try to restrict the list using something like "A*"...
 
Old 10-05-2006, 07:51 AM   #6
simba_cubs
LQ Newbie
 
Registered: Jul 2006
Posts: 10

Original Poster
Rep: Reputation: 0
Many thanks to you all for your quick response - much appreciated.
 
Old 10-05-2006, 10:13 AM   #7
budword
Member
 
Registered: Apr 2003
Location: Wisconsin
Distribution: Switched to regualr Ubuntu, because I don't like KDE4, at all. Looks like vista on crack.....
Posts: 675

Rep: Reputation: 31
The * at the end just tells grep to conduct the regex search at the current directory. It's not a globbing or regex wildcard in that context. I left the . in the tom.thumb regex because I thought it was supposed to be in there, to help find all the instances of tom?thumb. Please correct me if I got anything wrong.

Thanks much....

David
 
Old 10-05-2006, 11:19 AM   #8
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 332Reputation: 332Reputation: 332Reputation: 332
The find command can be used to select the files based on time. This list can then be fed to the grep command to find the files that contain the character string. The problem with the find command as it is written in the initial post is that there is a * following the command. The first term following the find command is the directory to search. The following example expects 8amoct4_06 to be a file that exists, not just a file specification. Try this.
Code:
find 20061004 -newer 20061004/8amoct4_06 -a ! -newer 20061004 /11_59amoct4_06 -exec grep -H tom.thumb {} \;
The part of the command that starts with -exec is where we feed the output of the find command to the grep command. The -H in the grep command tells grep to list the names of the files that contain the expression. If you want this list in a file then you can redirect the output of this using the > operator as in > result.txt.
Code:
find 20061004 -newer 20061004/8amoct4_06 -a ! -newer 20061004 /11_59amoct4_06 -exec grep -H tom.thumb {} \; > result.txt

Last edited by stress_junkie; 10-05-2006 at 11:29 AM.
 
Old 10-06-2006, 03:40 AM   #9
simba_cubs
LQ Newbie
 
Registered: Jul 2006
Posts: 10

Original Poster
Rep: Reputation: 0
The filenames appear as "1GUy9T-0006gz-Ne-H"....

Sorry I may not have been clear. I thought I would be able to search based the time stamp on the file
ls -la of the directory...

-rw-rw---- 1 Debian-exim Debian-exim 2979 2006-10-04 05:14 1GUy9T-0006gz-Ne-H

I was under the impression I would be able to search against the 2006-10-04 05:14 ?

Sorry if I was or am unclear I am fairly new to linux

Thanks for you help so far
 
Old 10-06-2006, 01:13 PM   #10
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 332Reputation: 332Reputation: 332Reputation: 332
Let's go back to your first post and see what we can do. Don't get discrouaged. I'm not being critical. I'm just trying to summarize what has been said so far.
Quote:
Originally Posted by simba_cubs
I need to find all files between 08:00 - 12:00 yesterday morning.
I understand this to mean that you want to list all of the files that arrived between 08:00 and noon on October 4, 2006.
Quote:
Originally Posted by simba_cubs
The files are emails and the structure of the directories in is date format, so each time a new directory is created and named 20061004 for example.
I understand this to mean that a new directory is created every day. The name of the directory is the date of that day. The name of the directory for October 4, 2006 is 20061004.
Quote:
Originally Posted by simba_cubs
In the 20061004 directory there are 47,000 files. I need to extract all files that contain a user's name from those 47,000.
I understand this to mean that you want to LIST all of the files that contain the user's name.
Quote:
Originally Posted by simba_cubs
I've tried the following...
grep "tom.thumb" *
This returns a "bash: /bin/grep: Argument list too long" error.
Wim Sturkenboom explained in post #4 that the dot is a special character and you need to put a slash in front of it when you want to include the dot in a regular expression. But, that isn't the reason that you got the error message.

olaola explained in post #5 that using the wildcard character * resulted in too many file names being passed to the grep command. That is the reason that you got the error message.
Quote:
Originally Posted by simba_cubs
I then tried
find * -newer 8amoct4_06 ! 11_59amoct4_06 -print
That returned the following "bash: /usr/bin/find: Argument list too long" error.
In post #8 I explained that the first argument in the find command has to be a directory to search. Putting a * there was a mistake. Then I showed how the find command would accept the name of the directory that you want to search. In this case the directory name is 20061004. So I started the find command as "find 20061004".

Quote:
Originally Posted by simba_cubs
Could someone tell me where I'm going wrong please.
Many thanks
At this point your request is satified. You have been told where you have gone wrong.
=====
Your last post introduced new information.
Quote:
Originally Posted by simba_cubs
The filenames appear as "1GUy9T-0006gz-Ne-H"....
Okay. You could have adapted what you have already been told to accomodate this file name format.
Quote:
Originally Posted by simba_cubs
Sorry I may not have been clear. I thought I would be able to search based the time stamp on the file
ls -la of the directory...
-rw-rw---- 1 Debian-exim Debian-exim 2979 2006-10-04 05:14 1GUy9T-0006gz-Ne-H
I was under the impression I would be able to search against the 2006-10-04 05:14 ?
You can search based on the last access time or the last modification time of a file but not in the form that you see when you list files. The system keeps the dates and times of files in a different format. You cannot search on the date in the form of a text string. Well, not directly.
Quote:
Originally Posted by simba_cubs
Sorry if I was or am unclear I am fairly new to linux
I don't think that you were unclear. I think that once you got the answer to your question your concept of the question changed.
Quote:
Originally Posted by simba_cubs
Thanks for you help so far
Everybody here is very happy to help, especially new Linux users and admins. We all want your experience with Linux to be positive and enjoyable.
=====
One of the problems with the find command is that it doesn't have an argument that just says "after 08:00 and before 12:00". Nevertheless, you need the find command in order to pass file names to the grep command one at a time. If you just try to use the grep command and pass all of the file names to it in one command you will pass too many file names at one time, as you already know. So let's look at how to build a find comand that will do the job.

First we know that we need to use the grep command to search the contents of the email files for the user name tom.thumb. The -H parameter of the grep command tells grep to list the name of the file that contains the search string.
In the following examples I will use question marks to indicate something that we don't know yet. Also, I stopped using quotation marks in regular expressions when I found that the result can be unpredictable.
Code:
grep -H tom\.thumb ?????
Second, we know that we need to use the find command to pass file names one at a time to the grep command.
Code:
find ?????????? -exec grep -H tom\.thumb {} \;
The first parameter to the find comand is the directory to search. In this case it is the 20061004 directory.
Code:
find 20061004 ?????????? -exec grep -H tom\.thumb {} \;
We could take out the question marks and run the find command as it is.
Code:
find 20061004  -exec grep -H tom\.thumb {} \;
That would do what you originally said that you wanted to do using just the grep command. However the output is a bit messy because it will include both the file name and the line that the search string is found in. We can make the output easier to read using the cut command.
Code:
find 20061004  -exec grep -H tom\.thumb {} \; | cut -d ":" -f 1
Now that's a sweet looking output. If you want those file names in a text file you can redirect the output to a file as follows.
Code:
find 20061004  -exec grep -H tom\.thumb {} \; | cut -d ":" -f 1 > tom-thumb-emails.txt
Once you want to select the files that arrived between 08:00 and 12:00 noon we start to find the deficiencies of the find command. The find command does not have very many parameters that test the date and time of files. We have to do a bit of work and find logical conditions that satisfy the time requirement while using the poor selection of parameters available in the find command. The man page of the find commnad tells us that none of the available parameters tests the creation time of the file. Unfortunately Linux and Unix don't keep track of the creation time of files; just the last time that they were accessed and the last time that they were modified. If these files had the time that they arrived as a string inside the email then we can use grep to search for that. If the arrival time of the emails was included in the email file name we could use that to select the proper files. Unfortunately neither of these conditions is true. The emails will have the time that they were sent inside the email, but not the time that they arrived. The file names, as you have shown, do not include the arrival time in any format.

Try the last example of the find command with the pretty output and see if that does what you need it to do. Write back and append more posts to this thread if you want more help. I will be watching this thread for a few days. The other posters might also be watching this thread.

Last edited by stress_junkie; 10-06-2006 at 01:30 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LinuxQuestions.org Hits 2,000,000 Posts jeremy Linux - News 16 06-19-2006 07:57 AM
1,000,000,000 PCs by 2010 masand Linux - News 4 11-01-2004 01:55 AM
LinuxQuestions.org Surpasses 1,000,000 Posts jeremy LQ Suggestions & Feedback 15 07-21-2004 12:52 AM
copying 30,000 files amadkow Linux - Software 10 06-09-2004 12:12 PM


All times are GMT -5. The time now is 02:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration