LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Extracting text from a file. (http://www.linuxquestions.org/questions/linux-newbie-8/extracting-text-from-a-file-876726/)

TheNewGuy2936 04-23-2011 08:09 PM

Extracting text from a file.
 
Hello everyone:

There are some log files that I wish to get some information from (Apache Access Log) but it is HUGE! All I need as of right now is any information from date and time A to date and time B. What commands can I use to extract this information from the access_log and put it into another file with just that information? I created a file called "access_info" by doing
Code:

touch access_info
but I was not sure where to go from there. Thank you everyone in advance for your help! :)

frankbell 04-23-2011 08:29 PM

grep should do the job.

grep [date] /path-to/filename.

The date has to be in the same format used in the file.

See man grep for more.

TheNewGuy2936 04-23-2011 08:48 PM

So just to be sure before I do so, can I do something like this:

Code:

grep 2011-03-18 access_log > access_info

frankbell 04-23-2011 08:59 PM

I think so (I'm still learning grep).

You can test with

grep 2011-03-18 access_log

to see whether the desired output appears on the screen.

Depending on your current directory, you may have to include the full path to access_log.

Telengard 04-23-2011 10:22 PM

You may want to adopt the practice of enclosing regular expressions in single quotes when invoking grep within bash. Bash uses some of the same meta-characters as used in regular expressions (even though they mean different things).

Code:

grep '2011-03-18' access_log
In this case it shouldn't matter, so frankbell's example should work fine. If your regular expression included [, *, or | then Bash would eat them before grep.

frankbell 04-23-2011 10:30 PM

Telengard, becoming proficient at regular expressions is next on my list.

Thanks.

Telengard 04-23-2011 10:55 PM

Quote:

Originally Posted by frankbell (Post 4334080)
Telengard, becoming proficient at regular expressions is next on my list.

I began experimenting with Linux in 2005 to help with my Unix and programming classes. I've been using Linux on my own machines since 2006. In April 2009 I switch to Linux full time on my personal desktop. I still don't consider myself proficient with regular expressions. Take your time. ;D

A few tips you may find helpful:
  • man 7 regex provides a fairly easy to read overview of regular expressions.
  • Gnu Grep 2.7 manual is more in depth.
  • Regular-Expressions.info is a good introductory course for practical applications.
  • Almost every program using regular expressions has its own special syntax conventions. Regular expressions which work in one program don't necessarily work in all of them.

HTH

kurumi 04-23-2011 11:27 PM

there are other tools besides grep that can do pattern matching, example awk, Perl, Python etc. I prefer Ruby.
Code:

$ ruby -ne 'print if /your date pattern/../next date pattern/' file

Telengard 04-23-2011 11:36 PM

Quote:

Originally Posted by kurumi (Post 4334101)
there are other tools besides grep that can do pattern matching, example awk, Perl, Python etc. I prefer Ruby.
Code:

$ ruby -ne 'print if /your date pattern/../next date pattern/' file

I see your ruby and raise you an AWK :)

Code:

awk '/2011-03-18/' access_log
There are many, many ways:

Code:

while read ; do [[ "$REPLY" =~ 2011-03-18 ]] && echo "$REPLY" ; done < access_log
;)

kurumi 04-25-2011 02:14 AM

Quote:

Originally Posted by Telengard (Post 4334105)
I see your ruby and raise you an AWK :)

Code:

awk '/2011-03-18/' access_log

Lol, that's not equivalent to my Ruby example. Same with the shell one.

Telengard 04-25-2011 04:57 PM

Quote:

Originally Posted by kurumi (Post 4335208)
Lol, that's not equivalent to my Ruby example. Same with the shell one.

I don't know Ruby :p

kurumi 04-25-2011 07:00 PM

Quote:

Originally Posted by Telengard (Post 4336075)
I don't know Ruby :p

awk has a similar syntax
Code:

awk '/pattern1/,/pattern2/' file

Telengard 04-25-2011 09:56 PM

Quote:

Originally Posted by kurumi (Post 4336163)
Code:

awk '/pattern1/,/pattern2/' file

The Gawk manual calls this a range pattern. I guess that means your Ruby program begins printing with your date pattern and stops printing after next date pattern.

TheNewGuy2936 04-26-2011 10:16 AM

the "awk" worked out great! Thanks again for everyones help! This really helped me on this big access_log file which was 7.2 GIGS


All times are GMT -5. The time now is 11:29 AM.