LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-20-2006, 06:58 PM   #1
thekillerbean
Member
 
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92

Rep: Reputation: 16
Data extraction from a really, really huge file.


I have a really huge file and need to extract data from it. My dilema is that I'm not quite good yet with grep/awk/sed which I believe will be my best bet - I do not have the option to use PERL as it is not installed and it will not be installed. I'm pretty conversant with regex's so I'm sorta 25% ready for the job at hand.

The file has several markers, such as:
[marker-1]
...
data
...
[marker-2]
....
data
....
[marker-n]

I want to implement 2 options by which to extract this data by:

1:
I want to use a counter to keep track of which section the extracted data came from. For instance, with a count of 2, I'd know that the data was extracted between markers marker-1 and marker-2.

2:
I want to be able to give the script the starting and ending markers (myscript marker-1 marker n) and the script will extract data from between these 2 markers, irrespective of whether there are other markers between the 2 provided.

I'm not looking for a free handout, but rather the technique on how to use grep/awk/sed to determine that I'm past the first marker, but I'm yet to get to the second one!

I hope I've been clear enough to get my problem across for someone to help me.

Cheers,
kb.
 
Old 03-20-2006, 08:13 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
If youre aiming purely at data extraction I would let go of the counter thing.
line_nr=($(grep -n marker file|cut -d ":" -f 1)), sed -n $[${line_nr[0]}+1],$[${line_nr[1]}-1]p file
to rip the first chunk to stdout. If you want to select which chunks to rip you could use Bash' built-in "select".
 
Old 03-23-2006, 10:22 PM   #3
thekillerbean
Member
 
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92

Original Poster
Rep: Reputation: 16
Thanks. These details and a little MANning has helped me get past my earlier stumbling block. However, I now have another issue:

Say in this huge file, there are several variables that have the same value, or are higher than a set value, how can I use awk to not only return the value I'm looking for, but also the number of times it has appeared?

For instance, say my data looks like this:
a=5
b=6
c=5
d=25
e=7

From the above I want not only to extract the uniq values, but also how often they appear. I'd like to have output that looks like this:
5 - 2
6 - 1
7 - 1
25 - 1

I also would like to extend this to reveal:

2 entries < 6
1 entry > 10

What I'm getting right now is the uniq values by piping 'awk' output to 'sort' then 'uniq.' I've been unable to determine how to keep count of occurence!

Cheers,
kb.
 
Old 03-24-2006, 08:20 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
From the above I want not only to extract the uniq values, but also how often they appear.
Dunno how with awk, but with grep you can search hugefile for the amount of times a string is matched.


I also would like to extend this
Thats just a count. Should be the easiest thing to do.


Post the script?
 
Old 04-09-2006, 04:18 AM   #5
thekillerbean
Member
 
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92

Original Poster
Rep: Reputation: 16
Sorry for late response. Been busy with other things.

Unfortunately (well fortunate for me), this task was taken over by someone with much more experience than I at using the formentioned and it took him but a few hours to get the required data.

Cheers,
tkb.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
What is best 2D graphics for huge scientific data hill0093 Linux - Software 2 02-20-2006 10:36 AM
america's army extraction fails from .bin file - Slakkware 10 AM1SHFURN1TURE Linux - Games 1 06-25-2005 03:23 PM
header file extraction ? (.hdr) t3gah Linux - Software 1 03-21-2005 03:57 AM
pcap data extraction black_man Linux - Networking 1 08-28-2004 08:52 AM
Large tar file taking huge disk space in ext3 file system pcwulf Linux - General 2 10-20-2003 07:45 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration