Help working on a script to search for specific data.
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Help working on a script to search for specific data.
Hello and thanks ahead of time for any help.
I'm working on a project for work, to take a text file generated by a Spectrum Analyzer, and read every 28th line into a file. The data looks like this:
the lines are actually much longer. but what matters is that it grabs data 28 times per second and I only need the data once per second so every 28th line is what I need. The data is just date,time,duration,data. The duration in above line is "1". That number counts up on each new line like
So I thought I would write a script to grab every 28th line and output it to a new file. This is what I came up with:
clear
line_number_count=0
echo "Enter the Input File Name] "
echo "Make sure to include the files Location Path "
read file_name_input
echo "Enter the Output File Name and path"
read output_file_name
echo "You Entered the Following Information "
echo "input file: $file_name_input "
echo "output file: $output_file_name "
echo "Is This Correct? Enter (yes,no) "
read yes_no
if ["$yes_no" = "yes"]
then
while ["$line_number_count" != 65000]
do
echo "processing"
grep ',[$line_number_count+28],' > "$output_file_name"
done
else
echo "You indicated that the path and or file name entered"
echo "was incorrect, the script has now exited. Please "
echo "rerun this script and enter the correct information."
echo "Have a Nice Day"
fi
surfice it to say it doesn't work, and I'm not a programer by trade. So any help would be appreciated
Yes it worked but it turned out I needed smothing that would do it at different Frequencies which scanned at different intervals, anywhere from 10 times a second all the way up to 1000. Unfortunatly I didn't know that until today when all the actual feild data came in. The test data was just to uniform. But I did get it to work by using the uniq command. and calling the first 17 characters which in my data was the date and time. With the uniq command the data gets stripped to that only one of each unique time is left.
Thanks for the help though, you sent me in the right direction.
I'll probably have another question at some point but I have to processes 67GB of data first.
Fortunatly we're only scanning 20 specific channels for this project. I wrote a script in math lab to do all the work and it did work but took hours to compute a 250 mb file. But the uniq command does the same thing in seconds.
My next task is "one of many" is to take the data and pull out each frequency and power combination along with there time stamps. For instance
Where the first part is the DATE, TIME and Duration
05/02/05,17:40:40,1
05/02/05,17:40:40,2
and the second part is the frequency and power
853.26250,-120
The probem of course being that the program records all the frequencies on one line like
date, time, duration, freq1, power1, freq2, power2, freq3, power3,... and so on
So seperating them out will be a trick. Which I have done in math lab but it take a while to process and shaveing time of doing it in a shell script means that when the real data comes in I'll be able to process it within a day or two instead of a week or two.
I'm thinking of doing a grep for the frequencies since they don't change but that grabs the whole line and not just the chunk I need. If you've got any suggestions I'd be glad to take them.
From your description, you may want to use awk to extract the information.
Awk is better at handling and processing files consisting of records.
It also has a BEGIN block for preprocessing and an END block for post processing.
But mostly, because a particular field can be selected easily, such as
{ print $3 log($4) $5 log($4) }
Using sed, you would need to use the pattern of the line to be able to output selected fields to a file.
This would write the substitution to a file "freq1_file"
Because subsequent sed commands would operate on the changed line, you need to first save the line read, and then before each additional substitution, read the original line from the hold space.
Code:
1~28{
# save the original line
h
# output the freq1 data file info
1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \2/w freq1_file
# retrieve the original line
g
# output the freq2 data file info
1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \3/w freq2_file
# retrieve the original line
g
# output the freq3 data file info
1~28s/\(<pattern matching common part>\)(\<freq1 data1 pattern>\)\(<freq2 data2 pattern>\)\(freq3 data3 pattern\)/\1 \4/w freq3_file
}
Tinksters reply was submitted while I was writing the mine. You may A my sed example was not trivial. That is why awk may be better.
The difference is due to being able to access fields of each line using $<field_number> whereas with sed, I used positional parameters \(, \), \2, \3 to store and replace parts.
Another difference is that with sed, what is written is the result of the replacement. So saving the original line in the hold register is necessary to be able to perform a different substitution for the next frequency/data fields.
However, since you are working on a very large dataset, you might try both. One approach may be faster than the other.
A totally different approach may be to read each line into a bash array variable. This approach may be even faster because you are not executing a program. However, since the every line part of the loop is contained inside of the sed or awk program, it wouldn't save that much time.
One feature of awk hasn't been mentioned yet. Awk handles floating point variables. Together with the built in arithmetic functions ( including trig and log functions ), you could for example read in a configuration file in the BEGIN portion containing scaling information for each sensor, then use these values to convert/normalize the raw data from each sensor. This might speed up the post processing phase.
Imagine writing your awk program as a filter to split and process your data in real-time. Sounds neat!
So there it's made up of
data, time, duration, freq1, power1, freq2, power2 and so on. This particular example has 10 frequncies "ie: 853 and so on" and 10 matching powers for those frequencies "ie: -120"
every line in the files I'm getting is like this. What I need to do at this point is take the corosponding frequencies in each line like
853.26350 and put it into a file along with it's time and and date. For this example I would end of with
thanks I'm going to try it latter, I need some sleep, I have to be back to work in 6.5 hrs. It's been a long day of meetings and then programing and test set design. I'll let you know in the morning when I'm logged back into everything.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.