LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 12-15-2009, 09:02 PM   #1
lothario
Member
 
Registered: Apr 2004
Posts: 340

Rep: Reputation: 30
extracting lines from very large data files


I have to extract chunks of data from very large data files.

Each field has double quotes around it.
Consecutive fields are also separated by a comma.

The first field is a date-time field in this format:
"YYYY MM DD hh mm ss"
Where the hours (hh) are from 00 to 23.

Here is a section from a data file:

Quote:
"2009 11 28 15 00 22","status36","green","true","0"
"2009 11 28 15 03 31","status48","blue","0","false"
"2009 11 28 15 18 04","status48","black","0","0"
"2009 11 28 15 59 48","status48","pink","0","false"
"2009 11 28 16 01 18","status36","orange","0","0"
"2009 11 28 16 01 43","status36","white","true","false"
"2009 11 28 16 25 22","status36","7381","0","0"
"2009 11 28 16 56 40","status36","1657","true","0"
"2009 11 28 17 06 19","status36","blue","0","false"
"2009 11 28 18 17 41","status36","2130","0","0"
"2009 11 28 18 23 52","status36","red","true","0"
"2009 11 28 18 29 45","status48","green","0","false"
"2009 11 28 19 00 50","status36","pink","0","0"
"2009 11 28 19 01 05","status48","white","0","false"
"2009 11 28 21 00 55","status48","7381","0","0"
"2009 11 28 21 01 18","status48","orange","true","false"
"2009 11 28 21 29 21","status36","7381","0","0"
"2009 11 28 21 54 43","status48","blue","true","0"
"2009 11 28 22 09 05","status48","1657","0","false"
"2009 11 28 22 34 14","status48","2130","0","0"
"2009 11 28 22 34 30","status48","red","0","false"
"2009 11 28 23 04 09","status36","2130","true","0"
"2009 11 28 23 04 22","status36","red","0","0"
"2009 11 29 01 01 26","status48","7381","true","false"

I need to extract all the lines from this file that are:
  • Equal to or greater than a start date time.
  • Equal to or less than an end date time.

So if I specify the FROM and TO range as:
2009 11 28 20 00 00
2009 11 29 02 00 00

Then the following lines would be extracted:
Quote:
"2009 11 28 21 00 55","status48","7381","0","0"
"2009 11 28 21 01 18","status48","orange","true","false"
"2009 11 28 21 29 21","status36","7381","0","0"
"2009 11 28 21 54 43","status48","blue","true","0"
"2009 11 28 22 09 05","status48","1657","0","false"
"2009 11 28 22 34 14","status48","2130","0","0"
"2009 11 28 22 34 30","status48","red","0","false"
"2009 11 28 23 04 09","status36","2130","true","0"
"2009 11 28 23 04 22","status36","red","0","0"
"2009 11 29 01 01 26","status48","7381","true","false"
How do I do this with a bash script?
 
Old 12-15-2009, 09:22 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
use (g)awk for large files.
Code:
awk 'BEGIN{
    OFS=FS=","
    printf "Enter from range: YYYY MM DD hh mm ss: "
    getline fromdate < "-"
    printf "Enter to range: YYYY MM DD hh mm ss: "
    getline todate < "-"
    m=split(fromdate, FROM," ")
    n=split(todate, TO," ")
    mktimefrom = mktime(FROM[1]" "FROM[2]" "FROM[3]" "FROM[4]" "FROM[5]" "FROM[6])
    mktimeto = mktime(TO[1]" "TO[2]" "TO[3]" "TO[4]" "TO[5]" "TO[6])
}
{
 o=$0
 gsub("\042","",$1)
 p=split($1, C, " ")
 mktime1st = mktime(C[1]" "C[2]" "C[3]" "C[4]" "C[5]" "C[6])
 if ( mktime1st >= mktimefrom && mktime1st <= mktimeto ){
    print o
 }
} ' file
output
Code:
$ ./shell.sh
Enter from range: YYYY MM DD hh mm ss: 2009 11 28 15 03 31
Enter to range: YYYY MM DD hh mm ss: 2009 11 28 16 25 22
"2009 11 28 15 03 31","status48","blue","0","false"
"2009 11 28 15 18 04","status48","black","0","0"
"2009 11 28 15 59 48","status48","pink","0","false"
"2009 11 28 16 01 18","status36","orange","0","0"
"2009 11 28 16 01 43","status36","white","true","false"
"2009 11 28 16 25 22","status36","7381","0","0"
$
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
AWK/Perl for extracting data from txt file to numerous other files briana.paige Linux - Newbie 2 05-05-2009 09:53 AM
extracting data from html files into one text file adityavpratap Slackware 9 05-10-2007 10:30 AM
Extracting MySQL data from raw files cs-cam Linux - Software 1 06-12-2006 11:22 PM
Extracting Lines from files supreme_command Linux - Newbie 1 05-12-2004 04:21 AM
Large data files on CD dema Linux - Newbie 1 01-26-2002 10:30 PM


All times are GMT -5. The time now is 11:48 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration