LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-22-2005, 06:06 PM   #1
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Rep: Reputation: 0
Help working on a script to search for specific data.


Hello and thanks ahead of time for any help.

I'm working on a project for work, to take a text file generated by a Spectrum Analyzer, and read every 28th line into a file. The data looks like this:

05/02/05,17:40:40,1,853.26250,-120,853.83750,-85,854.51250 ...

the lines are actually much longer. but what matters is that it grabs data 28 times per second and I only need the data once per second so every 28th line is what I need. The data is just date,time,duration,data. The duration in above line is "1". That number counts up on each new line like

05/02/05,17:40:40,1,853.26250,-120,853.83750,-85,854.51250, ...
05/02/05,17:40:40,2,853.26250,-120,853.83750,-85,854.51250, ...
05/02/05,17:40:40,3,853.26250,-120,853.83750,-85,854.51250, ...
05/02/05,17:40:40,4,853.26250,-120,853.83750,-85,854.51250, ...

So I thought I would write a script to grab every 28th line and output it to a new file. This is what I came up with:

clear
line_number_count=0
echo "Enter the Input File Name] "
echo "Make sure to include the files Location Path "
read file_name_input
echo "Enter the Output File Name and path"
read output_file_name
echo "You Entered the Following Information "
echo "input file: $file_name_input "
echo "output file: $output_file_name "
echo "Is This Correct? Enter (yes,no) "
read yes_no
if ["$yes_no" = "yes"]
then
while ["$line_number_count" != 65000]
do
echo "processing"

grep ',[$line_number_count+28],' > "$output_file_name"
done
else
echo "You indicated that the path and or file name entered"
echo "was incorrect, the script has now exited. Please "
echo "rerun this script and enter the correct information."
echo "Have a Nice Day"

fi




surfice it to say it doesn't work, and I'm not a programer by trade. So any help would be appreciated
 
Old 05-22-2005, 06:54 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Replace the loop and grep with:
Code:
sed -n '1~28p' $inputfile > $outputfile
And no, I didn't use your variable names,
too long ;)


Cheers,
Tink

Last edited by Tinkster; 05-22-2005 at 06:57 PM.
 
Old 05-22-2005, 10:09 PM   #3
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
thanks I'll try it when I get back to work in the morning. I'll let you know how it works.
 
Old 05-23-2005, 08:35 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
And, did it? :}


Cheers,
Tink
 
Old 05-24-2005, 08:10 PM   #5
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
Yes it worked but it turned out I needed smothing that would do it at different Frequencies which scanned at different intervals, anywhere from 10 times a second all the way up to 1000. Unfortunatly I didn't know that until today when all the actual feild data came in. The test data was just to uniform. But I did get it to work by using the uniq command. and calling the first 17 characters which in my data was the date and time. With the uniq command the data gets stripped to that only one of each unique time is left.

Thanks for the help though, you sent me in the right direction.

I'll probably have another question at some point but I have to processes 67GB of data first.
 
Old 05-24-2005, 08:17 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Heh - that sounds like the amount of data we gathered from
echo sounders - just at a much higher res ;}


Cheers,
Tink
 
Old 05-25-2005, 06:49 PM   #7
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
Fortunatly we're only scanning 20 specific channels for this project. I wrote a script in math lab to do all the work and it did work but took hours to compute a 250 mb file. But the uniq command does the same thing in seconds.

My next task is "one of many" is to take the data and pull out each frequency and power combination along with there time stamps. For instance

05/02/05,17:40:40,1,853.26250,-120
05/02/05,17:40:40,2,853.26250,-120

and

05/02/05,17:40:40,1,854.26250,-96
05/02/05,17:40:40,2,854.26250,-96

Where the first part is the DATE, TIME and Duration

05/02/05,17:40:40,1
05/02/05,17:40:40,2

and the second part is the frequency and power

853.26250,-120

The probem of course being that the program records all the frequencies on one line like

date, time, duration, freq1, power1, freq2, power2, freq3, power3,... and so on

So seperating them out will be a trick. Which I have done in math lab but it take a while to process and shaveing time of doing it in a shell script means that when the real data comes in I'll be able to process it within a day or two instead of a week or two.

I'm thinking of doing a grep for the frequencies since they don't change but that grabs the whole line and not just the chunk I need. If you've got any suggestions I'd be glad to take them.
 
Old 05-25-2005, 07:01 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
In that scenario of separating those long lines out: do you
need to prepend every frequency/power-level pair with the
date, time, duration tupel?

Either way, awk seems like a good tool ;}


awk -F, '{for(i=4; i < NF; i+=2){printf "(formatting here)\n", $1, $2,$3,$i,$(i+1)}}' file


You get the idea :}

[edit]
actual data:
data.txt
Code:
05/02/05,17:40:40,1,853.26250,-120,53.26250,-10,83.250,-20,853.2,-70
awk-bit:
Code:
awk -F, '{for(i=4; i < NF; i+=2){printf "%9s %8s %2d %-8e %-8e\n", $1, 2,$3,$i,$(i+1)}}' data.txt
And the output:
Code:
 05/02/05 17:40:40  1 8.532625e+02 -1.200000e+02
 05/02/05 17:40:40  1 5.326250e+01 -1.000000e+01
 05/02/05 17:40:40  1 8.325000e+01 -2.000000e+01
 05/02/05 17:40:40  1 8.532000e+02 -7.000000e+01
[/edit]



Cheers,
Tink

Last edited by Tinkster; 05-25-2005 at 07:11 PM.
 
Old 05-25-2005, 09:31 PM   #9
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
It does work, but is there a way to output each of the same frequencies from each line to a file so that I start with

date, time, duration, freq1, power1, freq2, power2, freq3, power3,... freq 20, power20

and I end up with 20 files each with

File 1:
date, time, duration, freq1, power1

File 2:
date, time, duration, freq2, power2

File 3:
date, time, duration, freq3, power3

and so on?

PS: I fully intend to "donate" ie $ for you're time, you've been a big help so far.
 
Old 05-25-2005, 09:36 PM   #10
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
From your description, you may want to use awk to extract the information.
Awk is better at handling and processing files consisting of records.
It also has a BEGIN block for preprocessing and an END block for post processing.

But mostly, because a particular field can be selected easily, such as
{ print $3 log($4) $5 log($4) }

Using sed, you would need to use the pattern of the line to be able to output selected fields to a file.
Code:
1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \2/w freq1_file
This would write the substitution to a file "freq1_file"
Because subsequent sed commands would operate on the changed line, you need to first save the line read, and then before each additional substitution, read the original line from the hold space.
Code:
1~28{
            # save the original line
            h
            # output the freq1 data file info
            1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \2/w freq1_file
            # retrieve the original line
            g
            # output the freq2 data file info
            1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \3/w freq2_file
            # retrieve the original line
            g
            # output the freq3 data file info
            1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \4/w freq3_file
        }

Last edited by jschiwal; 05-25-2005 at 10:05 PM.
 
Old 05-25-2005, 09:42 PM   #11
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
That change is next to trivial :)

Code:
awk -F, '{for(i=4; i < NF; i+=2){name+=1; printf "%9s %8s %2d %-8e %-8e\n", $1, 2,$3,$i,$(i+1) > name}}' data.txt
This will give you numerical filenames in increments of 1


Cheers,
Tink

Last edited by Tinkster; 05-25-2005 at 09:43 PM.
 
Old 05-25-2005, 10:14 PM   #12
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Tinksters reply was submitted while I was writing the mine. You may A my sed example was not trivial. That is why awk may be better.
The difference is due to being able to access fields of each line using $<field_number> whereas with sed, I used positional parameters \(, \), \2, \3 to store and replace parts.

Another difference is that with sed, what is written is the result of the replacement. So saving the original line in the hold register is necessary to be able to perform a different substitution for the next frequency/data fields.

However, since you are working on a very large dataset, you might try both. One approach may be faster than the other.

A totally different approach may be to read each line into a bash array variable. This approach may be even faster because you are not executing a program. However, since the every line part of the loop is contained inside of the sed or awk program, it wouldn't save that much time.

One feature of awk hasn't been mentioned yet. Awk handles floating point variables. Together with the built in arithmetic functions ( including trig and log functions ), you could for example read in a configuration file in the BEGIN portion containing scaling information for each sensor, then use these values to convert/normalize the raw data from each sensor. This might speed up the post processing phase.

Imagine writing your awk program as a filter to split and process your data in real-time. Sounds neat!

Last edited by jschiwal; 05-25-2005 at 10:38 PM.
 
Old 05-25-2005, 10:21 PM   #13
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
Tink,

ok that was my fult I didn't explain it clearly enough.

I should have said that I wanted there to be 20 files in the end on containing a list of all the frequency scans over a period of time like

File 1:
date, time, duration, freq1, power1
date, time2, duration2, freq1,power1
data, time3, duration3, freq1, power1

File 2:
date, time, duration, freq2, power2
date, time2, duration2, freq2,power2
data, time3, duration3, freq2, power2

File 3:
date, time, duration, freq3, power3
date, time2, duration2, freq3,power3
data, time3, duration3, freq3, power3

and so on?

The script you posted works accept it creates a new file for every one line item, which isn't going to work for my final calculations.

Here's a real line of data from the SA:

05/02/05,17:40:41,22,853.26250,-120,853.83750,-84,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-120,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:53,345,853.26250,-120,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:54,372,853.26250,-120,853.83750,-82,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:55,399,853.26250,-120,853.83750,-83,854.51250,-120,855.21250,-119,855.71250,-120,868.60000,-121,868.91250,-119,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:56,427,853.26250,-120,853.83750,-84,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121

So there it's made up of
data, time, duration, freq1, power1, freq2, power2 and so on. This particular example has 10 frequncies "ie: 853 and so on" and 10 matching powers for those frequencies "ie: -120"

every line in the files I'm getting is like this. What I need to do at this point is take the corosponding frequencies in each line like
853.26350 and put it into a file along with it's time and and date. For this example I would end of with

File 1:
05/02/05,17:40:41,853.26250,-120
05/02/05,17:40:53,853.26250,-120
05/02/05,17:40:54,853.26250,-120
05/02/05,17:40:55,853.26250,-120

File 2:
05/02/05,17:40:41,853.83750,-120
05/02/05,17:40:53,853.83750,-120
05/02/05,17:40:54,853.83750,-120
05/02/05,17:40:55,853.83750,-120

and so on. I know it's a lot to ask but you seem like a total guru.
 
Old 05-25-2005, 11:01 PM   #14
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Code:
awk -F, '{for(i=4; i < NF; i+=2){printf "%9s %8s %2d %-8f %-8f\n", $1, $2,$3,$i,$(i+1) >> (i/2-1)  }} ' file
Hope I got you right this time..
And I don't know about the guru, but thanks ;}


Cheers,
Tink
 
Old 05-25-2005, 11:07 PM   #15
oracle11112
LQ Newbie
 
Registered: May 2005
Posts: 12

Original Poster
Rep: Reputation: 0
thanks I'm going to try it latter, I need some sleep, I have to be back to work in 6.5 hrs. It's been a long day of meetings and then programing and test set design. I'll let you know in the morning when I'm logged back into everything.

Thanks again for you're help.

- Oracle111122
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
assign new data partition to specific users wycolorado Linux - Newbie 2 01-01-2005 09:53 PM
how to search files with specific contents ? sachinh Linux - Security 4 07-22-2004 08:00 AM
reading data from a specific hd hard locks system thesammy Linux - Hardware 0 07-17-2004 02:17 AM
search for specific text in fields using awk Helene Programming 2 04-23-2004 12:13 AM
search for data in MySQL between two times. rhuser Programming 3 03-14-2003 12:32 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration