Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
03-20-2006, 06:58 PM
|
#1
|
Member
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92
Rep:
|
Data extraction from a really, really huge file.
I have a really huge file and need to extract data from it. My dilema is that I'm not quite good yet with grep/awk/sed which I believe will be my best bet - I do not have the option to use PERL as it is not installed and it will not be installed. I'm pretty conversant with regex's so I'm sorta 25% ready for the job at hand.
The file has several markers, such as:
[marker-1]
...
data
...
[marker-2]
....
data
....
[marker-n]
I want to implement 2 options by which to extract this data by:
1:
I want to use a counter to keep track of which section the extracted data came from. For instance, with a count of 2, I'd know that the data was extracted between markers marker-1 and marker-2.
2:
I want to be able to give the script the starting and ending markers (myscript marker-1 marker n) and the script will extract data from between these 2 markers, irrespective of whether there are other markers between the 2 provided.
I'm not looking for a free handout, but rather the technique on how to use grep/awk/sed to determine that I'm past the first marker, but I'm yet to get to the second one!
I hope I've been clear enough to get my problem across for someone to help me.
Cheers,
kb.
|
|
|
03-20-2006, 08:13 PM
|
#2
|
Moderator
Registered: May 2001
Posts: 29,417
|
If youre aiming purely at data extraction I would let go of the counter thing.
line_nr=($(grep -n marker file|cut -d ":" -f 1)), sed -n $[${line_nr[0]}+1],$[${line_nr[1]}-1]p file
to rip the first chunk to stdout. If you want to select which chunks to rip you could use Bash' built-in "select".
|
|
|
03-23-2006, 10:22 PM
|
#3
|
Member
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92
Original Poster
Rep:
|
Thanks. These details and a little MANning has helped me get past my earlier stumbling block. However, I now have another issue:
Say in this huge file, there are several variables that have the same value, or are higher than a set value, how can I use awk to not only return the value I'm looking for, but also the number of times it has appeared?
For instance, say my data looks like this:
a=5
b=6
c=5
d=25
e=7
From the above I want not only to extract the uniq values, but also how often they appear. I'd like to have output that looks like this:
5 - 2
6 - 1
7 - 1
25 - 1
I also would like to extend this to reveal:
2 entries < 6
1 entry > 10
What I'm getting right now is the uniq values by piping 'awk' output to 'sort' then 'uniq.' I've been unable to determine how to keep count of occurence!
Cheers,
kb.
|
|
|
03-24-2006, 08:20 AM
|
#4
|
Moderator
Registered: May 2001
Posts: 29,417
|
From the above I want not only to extract the uniq values, but also how often they appear.
Dunno how with awk, but with grep you can search hugefile for the amount of times a string is matched.
I also would like to extend this
Thats just a count. Should be the easiest thing to do.
Post the script?
|
|
|
04-09-2006, 04:18 AM
|
#5
|
Member
Registered: Jan 2002
Location: Melbourne, Australia
Distribution: Ubuntu 22.04 (Jammy)
Posts: 92
Original Poster
Rep:
|
Sorry for late response. Been busy with other things.
Unfortunately (well fortunate for me), this task was taken over by someone with much more experience than I at using the formentioned and it took him but a few hours to get the required data.
Cheers,
tkb.
|
|
|
All times are GMT -5. The time now is 10:02 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|