Help! Need to find certain files from 47,000
Hi,
I need to find all files between 08:00 - 12:00 yesterday morning. The files are emails and the structure of the directories in is date format, so each time a new directory is created and named 20061004 for example. In the 20061004 directory there are 47,000 files. I need to extract all files that contain a user's name from those 47,000. I've tried the following... grep "tom.thumb" * This returns a "bash: /bin/grep: Argument list too long" error. I then tried find * -newer 8amoct4_06 ! 11_59amoct4_06 -print That returned the following "bash: /usr/bin/find: Argument list too long" error. Could someone tell me where I'm going wrong please. Many thanks |
I might be wrong, but I think the file command just finds files by name, if you are looking for a certain string(a persons name) inside that file find won't help.
Check out the bottom of this page. http://www.computerhope.com/unix/ugrep.htm Looks like the following command might work grep -ir tom.thumb . Let me know if that helps.... Best of luck David |
Have u try something like this?
$grep -n "tom.thumb" . to get all file name? |
The problem with grep is the the dot is interpreted as a special character.
Code:
grep "tom\.thumb" * |
The problem is the number of files you are exploring (Argument list too long).
Using the "*" you are passing to the command (grep or find or something else) a list o files. When this list is too long you get an error. Try to restrict the list using something like "A*"... |
Many thanks to you all for your quick response - much appreciated.
|
The * at the end just tells grep to conduct the regex search at the current directory. It's not a globbing or regex wildcard in that context. I left the . in the tom.thumb regex because I thought it was supposed to be in there, to help find all the instances of tom?thumb. Please correct me if I got anything wrong.
Thanks much.... David |
The find command can be used to select the files based on time. This list can then be fed to the grep command to find the files that contain the character string. The problem with the find command as it is written in the initial post is that there is a * following the command. The first term following the find command is the directory to search. The following example expects 8amoct4_06 to be a file that exists, not just a file specification. Try this.
Code:
find 20061004 -newer 20061004/8amoct4_06 -a ! -newer 20061004 /11_59amoct4_06 -exec grep -H tom.thumb {} \; Code:
find 20061004 -newer 20061004/8amoct4_06 -a ! -newer 20061004 /11_59amoct4_06 -exec grep -H tom.thumb {} \; > result.txt |
The filenames appear as "1GUy9T-0006gz-Ne-H"....
Sorry I may not have been clear. I thought I would be able to search based the time stamp on the file ls -la of the directory... -rw-rw---- 1 Debian-exim Debian-exim 2979 2006-10-04 05:14 1GUy9T-0006gz-Ne-H I was under the impression I would be able to search against the 2006-10-04 05:14 ? Sorry if I was or am unclear I am fairly new to linux :study: Thanks for you help so far |
Let's go back to your first post and see what we can do. Don't get discrouaged. I'm not being critical. I'm just trying to summarize what has been said so far.
Quote:
Quote:
Quote:
Quote:
olaola explained in post #5 that using the wildcard character * resulted in too many file names being passed to the grep command. That is the reason that you got the error message. Quote:
Quote:
===== Your last post introduced new information. Quote:
Quote:
Quote:
Quote:
===== One of the problems with the find command is that it doesn't have an argument that just says "after 08:00 and before 12:00". Nevertheless, you need the find command in order to pass file names to the grep command one at a time. If you just try to use the grep command and pass all of the file names to it in one command you will pass too many file names at one time, as you already know. So let's look at how to build a find comand that will do the job. First we know that we need to use the grep command to search the contents of the email files for the user name tom.thumb. The -H parameter of the grep command tells grep to list the name of the file that contains the search string. In the following examples I will use question marks to indicate something that we don't know yet. Also, I stopped using quotation marks in regular expressions when I found that the result can be unpredictable. Code:
grep -H tom\.thumb ????? Code:
find ?????????? -exec grep -H tom\.thumb {} \; Code:
find 20061004 ?????????? -exec grep -H tom\.thumb {} \; Code:
find 20061004 -exec grep -H tom\.thumb {} \; Code:
find 20061004 -exec grep -H tom\.thumb {} \; | cut -d ":" -f 1 Code:
find 20061004 -exec grep -H tom\.thumb {} \; | cut -d ":" -f 1 > tom-thumb-emails.txt Try the last example of the find command with the pretty output and see if that does what you need it to do. Write back and append more posts to this thread if you want more help. I will be watching this thread for a few days. The other posters might also be watching this thread. |
All times are GMT -5. The time now is 08:44 PM. |