Help working on a script to search for specific data.
Hello and thanks ahead of time for any help.
I'm working on a project for work, to take a text file generated by a Spectrum Analyzer, and read every 28th line into a file. The data looks like this: 05/02/05,17:40:40,1,853.26250,-120,853.83750,-85,854.51250 ... the lines are actually much longer. but what matters is that it grabs data 28 times per second and I only need the data once per second so every 28th line is what I need. The data is just date,time,duration,data. The duration in above line is "1". That number counts up on each new line like 05/02/05,17:40:40,1,853.26250,-120,853.83750,-85,854.51250, ... 05/02/05,17:40:40,2,853.26250,-120,853.83750,-85,854.51250, ... 05/02/05,17:40:40,3,853.26250,-120,853.83750,-85,854.51250, ... 05/02/05,17:40:40,4,853.26250,-120,853.83750,-85,854.51250, ... So I thought I would write a script to grab every 28th line and output it to a new file. This is what I came up with: clear line_number_count=0 echo "Enter the Input File Name] " echo "Make sure to include the files Location Path " read file_name_input echo "Enter the Output File Name and path" read output_file_name echo "You Entered the Following Information " echo "input file: $file_name_input " echo "output file: $output_file_name " echo "Is This Correct? Enter (yes,no) " read yes_no if ["$yes_no" = "yes"] then while ["$line_number_count" != 65000] do echo "processing" grep ',[$line_number_count+28],' > "$output_file_name" done else echo "You indicated that the path and or file name entered" echo "was incorrect, the script has now exited. Please " echo "rerun this script and enter the correct information." echo "Have a Nice Day" fi surfice it to say it doesn't work, and I'm not a programer by trade. So any help would be appreciated |
Replace the loop and grep with:
Code:
sed -n '1~28p' $inputfile > $outputfile too long ;) Cheers, Tink |
thanks I'll try it when I get back to work in the morning. I'll let you know how it works.
|
And, did it? :}
Cheers, Tink |
Yes it worked but it turned out I needed smothing that would do it at different Frequencies which scanned at different intervals, anywhere from 10 times a second all the way up to 1000. Unfortunatly I didn't know that until today when all the actual feild data came in. The test data was just to uniform. But I did get it to work by using the uniq command. and calling the first 17 characters which in my data was the date and time. With the uniq command the data gets stripped to that only one of each unique time is left.
Thanks for the help though, you sent me in the right direction. I'll probably have another question at some point but I have to processes 67GB of data first. |
Heh - that sounds like the amount of data we gathered from
echo sounders - just at a much higher res ;} Cheers, Tink |
Fortunatly we're only scanning 20 specific channels for this project. I wrote a script in math lab to do all the work and it did work but took hours to compute a 250 mb file. But the uniq command does the same thing in seconds.
My next task is "one of many" is to take the data and pull out each frequency and power combination along with there time stamps. For instance 05/02/05,17:40:40,1,853.26250,-120 05/02/05,17:40:40,2,853.26250,-120 and 05/02/05,17:40:40,1,854.26250,-96 05/02/05,17:40:40,2,854.26250,-96 Where the first part is the DATE, TIME and Duration 05/02/05,17:40:40,1 05/02/05,17:40:40,2 and the second part is the frequency and power 853.26250,-120 The probem of course being that the program records all the frequencies on one line like date, time, duration, freq1, power1, freq2, power2, freq3, power3,... and so on So seperating them out will be a trick. Which I have done in math lab but it take a while to process and shaveing time of doing it in a shell script means that when the real data comes in I'll be able to process it within a day or two instead of a week or two. I'm thinking of doing a grep for the frequencies since they don't change but that grabs the whole line and not just the chunk I need. If you've got any suggestions I'd be glad to take them. |
In that scenario of separating those long lines out: do you
need to prepend every frequency/power-level pair with the date, time, duration tupel? Either way, awk seems like a good tool ;} awk -F, '{for(i=4; i < NF; i+=2){printf "(formatting here)\n", $1, $2,$3,$i,$(i+1)}}' file You get the idea :} [edit] actual data: data.txt Code:
05/02/05,17:40:40,1,853.26250,-120,53.26250,-10,83.250,-20,853.2,-70 Code:
awk -F, '{for(i=4; i < NF; i+=2){printf "%9s %8s %2d %-8e %-8e\n", $1, 2,$3,$i,$(i+1)}}' data.txt Code:
05/02/05 17:40:40 1 8.532625e+02 -1.200000e+02 Cheers, Tink |
It does work, but is there a way to output each of the same frequencies from each line to a file so that I start with
date, time, duration, freq1, power1, freq2, power2, freq3, power3,... freq 20, power20 and I end up with 20 files each with File 1: date, time, duration, freq1, power1 File 2: date, time, duration, freq2, power2 File 3: date, time, duration, freq3, power3 and so on? PS: I fully intend to "donate" ie $ for you're time, you've been a big help so far. |
From your description, you may want to use awk to extract the information.
Awk is better at handling and processing files consisting of records. It also has a BEGIN block for preprocessing and an END block for post processing. But mostly, because a particular field can be selected easily, such as { print $3 log($4) $5 log($4) } Using sed, you would need to use the pattern of the line to be able to output selected fields to a file. Code:
1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \2/w freq1_file Because subsequent sed commands would operate on the changed line, you need to first save the line read, and then before each additional substitution, read the original line from the hold space. Code:
1~28{ |
That change is next to trivial :)
Code:
awk -F, '{for(i=4; i < NF; i+=2){name+=1; printf "%9s %8s %2d %-8e %-8e\n", $1, 2,$3,$i,$(i+1) > name}}' data.txt Cheers, Tink |
Tinksters reply was submitted while I was writing the mine. You may A my sed example was not trivial. That is why awk may be better.
The difference is due to being able to access fields of each line using $<field_number> whereas with sed, I used positional parameters \(, \), \2, \3 to store and replace parts. Another difference is that with sed, what is written is the result of the replacement. So saving the original line in the hold register is necessary to be able to perform a different substitution for the next frequency/data fields. However, since you are working on a very large dataset, you might try both. One approach may be faster than the other. A totally different approach may be to read each line into a bash array variable. This approach may be even faster because you are not executing a program. However, since the every line part of the loop is contained inside of the sed or awk program, it wouldn't save that much time. One feature of awk hasn't been mentioned yet. Awk handles floating point variables. Together with the built in arithmetic functions ( including trig and log functions ), you could for example read in a configuration file in the BEGIN portion containing scaling information for each sensor, then use these values to convert/normalize the raw data from each sensor. This might speed up the post processing phase. Imagine writing your awk program as a filter to split and process your data in real-time. Sounds neat! |
Tink,
ok that was my fult I didn't explain it clearly enough. I should have said that I wanted there to be 20 files in the end on containing a list of all the frequency scans over a period of time like File 1: date, time, duration, freq1, power1 date, time2, duration2, freq1,power1 data, time3, duration3, freq1, power1 File 2: date, time, duration, freq2, power2 date, time2, duration2, freq2,power2 data, time3, duration3, freq2, power2 File 3: date, time, duration, freq3, power3 date, time2, duration2, freq3,power3 data, time3, duration3, freq3, power3 and so on? The script you posted works accept it creates a new file for every one line item, which isn't going to work for my final calculations. Here's a real line of data from the SA: 05/02/05,17:40:41,22,853.26250,-120,853.83750,-84,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-120,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121 05/02/05,17:40:53,345,853.26250,-120,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121 05/02/05,17:40:54,372,853.26250,-120,853.83750,-82,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121 05/02/05,17:40:55,399,853.26250,-120,853.83750,-83,854.51250,-120,855.21250,-119,855.71250,-120,868.60000,-121,868.91250,-119,869.00000,-121,867.00000,-121,868.00000,-121 05/02/05,17:40:56,427,853.26250,-120,853.83750,-84,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121 So there it's made up of data, time, duration, freq1, power1, freq2, power2 and so on. This particular example has 10 frequncies "ie: 853 and so on" and 10 matching powers for those frequencies "ie: -120" every line in the files I'm getting is like this. What I need to do at this point is take the corosponding frequencies in each line like 853.26350 and put it into a file along with it's time and and date. For this example I would end of with File 1: 05/02/05,17:40:41,853.26250,-120 05/02/05,17:40:53,853.26250,-120 05/02/05,17:40:54,853.26250,-120 05/02/05,17:40:55,853.26250,-120 File 2: 05/02/05,17:40:41,853.83750,-120 05/02/05,17:40:53,853.83750,-120 05/02/05,17:40:54,853.83750,-120 05/02/05,17:40:55,853.83750,-120 and so on. I know it's a lot to ask but you seem like a total guru. |
Code:
awk -F, '{for(i=4; i < NF; i+=2){printf "%9s %8s %2d %-8f %-8f\n", $1, $2,$3,$i,$(i+1) >> (i/2-1) }} ' file And I don't know about the guru, but thanks ;} Cheers, Tink |
thanks I'm going to try it latter, I need some sleep, I have to be back to work in 6.5 hrs. It's been a long day of meetings and then programing and test set design. I'll let you know in the morning when I'm logged back into everything.
Thanks again for you're help. :) - Oracle111122 |
All times are GMT -5. The time now is 12:54 PM. |