Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I have a really huge file and need to extract data from it. My dilema is that I'm not quite good yet with grep/awk/sed which I believe will be my best bet - I do not have the option to use PERL as it is not installed and it will not be installed. I'm pretty conversant with regex's so I'm sorta 25% ready for the job at hand.
The file has several markers, such as:
[marker-1]
...
data
...
[marker-2]
....
data
....
[marker-n]
I want to implement 2 options by which to extract this data by:
1:
I want to use a counter to keep track of which section the extracted data came from. For instance, with a count of 2, I'd know that the data was extracted between markers marker-1 and marker-2.
2:
I want to be able to give the script the starting and ending markers (myscript marker-1 marker n) and the script will extract data from between these 2 markers, irrespective of whether there are other markers between the 2 provided.
I'm not looking for a free handout, but rather the technique on how to use grep/awk/sed to determine that I'm past the first marker, but I'm yet to get to the second one!
I hope I've been clear enough to get my problem across for someone to help me.
If youre aiming purely at data extraction I would let go of the counter thing.
line_nr=($(grep -n marker file|cut -d ":" -f 1)), sed -n $[${line_nr[0]}+1],$[${line_nr[1]}-1]p file
to rip the first chunk to stdout. If you want to select which chunks to rip you could use Bash' built-in "select".
Thanks. These details and a little MANning has helped me get past my earlier stumbling block. However, I now have another issue:
Say in this huge file, there are several variables that have the same value, or are higher than a set value, how can I use awk to not only return the value I'm looking for, but also the number of times it has appeared?
For instance, say my data looks like this:
a=5
b=6
c=5
d=25
e=7
From the above I want not only to extract the uniq values, but also how often they appear. I'd like to have output that looks like this:
5 - 2
6 - 1
7 - 1
25 - 1
I also would like to extend this to reveal:
2 entries < 6
1 entry > 10
What I'm getting right now is the uniq values by piping 'awk' output to 'sort' then 'uniq.' I've been unable to determine how to keep count of occurence!
From the above I want not only to extract the uniq values, but also how often they appear.
Dunno how with awk, but with grep you can search hugefile for the amount of times a string is matched.
I also would like to extend this
Thats just a count. Should be the easiest thing to do.
Sorry for late response. Been busy with other things.
Unfortunately (well fortunate for me), this task was taken over by someone with much more experience than I at using the formentioned and it took him but a few hours to get the required data.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.