Help working on a script to search for specific data.

jschiwal · 05-26-2005, 12:14 AM

I used his sample data set as a sed exercise. I used variables in the sample sed one-liner, because the patterns would be way to long to post otherwise.
If he were to use sed, he would have more "$$fp$,$$pp$," pairs in his input pattern to cover each frequency,power pair. The substitution pattern would be like "#\1,\2,\4,\5#w freq1" and "#\1,\2,\6,\7#w freq2 .

Code:

export dayp='[[:digit:]][[:digit:]]/[[:digit:]][[:digit:]]/[[:digit:]][[:digit:]]'
export tp=[[:digit:]][[:digit:]]:[[:digit:]][[:digit:]]:[[:digit:]][[:digit:]]
export durp='[[:digit:]][[:digit:]]*'
export fp='[[:digit:]][[:digit:]]*\.[[:digit:]][[:digit:]]*'
export pp='-[[:digit:]][[:digit:]]*'

sed -n 's#^\('$dayp'\),\('$tp'\),\('$durp'\),\('$fp'\),\('$pp'\),.*$#date \1\ttime \2\tduration \3\tfreq \4\tpower \5#p' sensordata
date 05/02/05   time 17:40:41   duration 22     freq 853.26250  power -120
date 05/02/05   time 17:40:53   duration 345    freq 853.26250  power -120
date 05/02/05   time 17:40:54   duration 372    freq 853.26250  power -120
date 05/02/05   time 17:40:55   duration 399    freq 853.26250  power -120
date 05/02/05   time 17:40:56   duration 427    freq 853.26250  power -120

I imagine that if there are many frequency/power pairs in the data-set, that an awk script may run faster, because the sed version has a hold and get command for each additional output file (freq/power pair)
Just in case it isn't clear,
dayp = day pattern
tp = time pattern
durp = duration pattern
fp = frequency pattern
pp = power pattern

oracle11112 · 05-26-2005, 07:24 AM

Tink,

That worked wonderfully, actually both jschiwal and your's worked but the awk command ran faster on the larger chunks of data. Since speed is an issue when I get the final data, I went with awk. "and it's much shorter cleaner code"... My second to last step is to sort the data from each of the files that were generated and copy out only the lines where the power of the frequencies are greater than -100. As in:

05/02/05,17:40:53,853.26250,-120
05/02/05,17:40:53,853.26250,-70
05/02/05,17:40:53,853.26250,-70
05/02/05,17:40:53,853.26250,-120
05/02/05,17:40:53,853.26250,-120
...

So again we're looking at

date, time, freqency, power

If I copy out on the creater than -100 power's the new output file will contain

05/02/05,17:40:53,853.26250,-70
05/02/05,17:40:53,853.26250,-70
....

if you're wondering the power is measured in dbm, not that that is important but that's why the it's measured in negatives.

Tinkster · 05-26-2005, 01:31 PM

The chunks of data you mention there don't really match the
stuff the first awk-run would have created, nor the original
data ... is this a whole new approach to extract data from the
original data-set, like the first approach of mine where I mis-
understood your intentions?

Cheers,
Tink

oracle11112 · 05-26-2005, 03:17 PM

Ok so lets start from the begining.

So far My orriginal data looked like this:

Code:

05/02/05,17:40:46,156,853.26250,-120,853.83750,-80,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:46,157,853.26250,-120,853.83750,-80,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:46,158,853.26250,-120,853.83750,-80,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:53,345,853.26250,-70,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:53,346,853.26250,-70,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:53,347,853.26250,-70,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121

Where it's in the format of:

Code:

date,time,repetition,frequency1,power1,frequency2,power2,frequency3,power3,frequency4,power4,frequency5,power5,frequency6,power6,freq7,pwr7,freq8,pwr8,freq9,pwr9,freq10,pwr10

You can see from the data that it scans multiple times per second... So I used the uniq command like this

Code:

 
uniq --check-chars=17 file_name_input.txt > phase1_output_file_name.txt

This gives me the following output in phase1_output_file_name.txt

Code:

05/02/05,17:40:46,156,853.26250,-120,853.83750,-80,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121
05/02/05,17:40:53,347,853.26250,-70,853.83750,-83,854.51250,-120,855.21250,-120,855.71250,-120,868.60000,-121,868.91250,-121,869.00000,-121,867.00000,-121,868.00000,-121

Next I used your script of:

Code:

awk -F, '{for(i=4; i < NF; i+=2){printf "%9s %8s %2d %-8f %-8f\n", $1, $2,$3,$i,$(i+1) >> (i/2-1) }} ' $phase1_output_file_name

This gives me 10 output files, 1 for each of the frequencies. They each look like:

Code:

05/02/05  17:40:46  156  853.26250000  -120.00000
05/02/05  17:40:53  347  853.26250000  -70.00000

Which is exactly what I want to see, for that phase of my processing. Now, of course each of the 10 files has upwards of 60,000 seconds worth of records, and I really only need the data that is greater than -100 in the power feild. So I need somthing that will process the above text and only leave me with the lines where the power feild is greater than -100. So the output would simply be:

Code:

05/02/05  17:40:53  347  853.26250000  -70.00000

Where the line containing -120.00000 as a power was dropped because it is less than -100.

Hopefully that will clear it up.

Tinkster · 05-26-2005, 03:23 PM

Code:

awk -F, '{for(i=4; i < NF; i+=2){if( $(i+1) > -100 ){ printf "%9s %8s %2d %8f %8f\n", $1, $2,$3,$i,$(i+1) >> (i/2-1) } }} ' file

should give you only entries greater -100 in the varied files.

Cheers,
Tink

P.S.: I find awk awksome ;}

oracle11112 · 05-26-2005, 09:05 PM

You're awsome, I'm learning so much, and AWK is the best.

Ok so everything went great, and from:

Code:

awk -F, '{for(i=4; i < NF; i+=2){if( $(i+1) > -100 ){ printf "%9s %8s %2d %8f %8f\n", $1, $2,$3,$i,$(i+1) >> (i/2-1) } }} ' file

I got the following output:

Code:

05/02/05  17:40:53  347  853.26250000  -70.00000
05/02/05  17:40:54  348  853.26250000  -70.00000
05/02/05  17:40:55  349  853.26250000  -70.00000
05/02/05  17:41:01  355  853.26250000  -76.00000
05/02/05  17:41:02  356  853.26250000  -76.00000
05/02/05  17:41:03  359  853.26250000  -76.00000
05/02/05  17:41:04  360  853.26250000  -76.00000
05/02/05  17:41:05  361  853.26250000  -76.00000
05/02/05  17:41:06  362  853.26250000  -76.00000
05/02/05  17:41:07  363  853.26250000  -76.00000

Which is perfect. It tells me that someone pushed a button on a "Push-to-talk" radio and sent
a message at a frequency of 854.26250000. And for the time frame of 17:40:53-55 the transmission
had a power at the receiver of -70 dbm's. For the time frame of 17:41:01-07 the transmission had
a power at the receiver of -76 dbm's.

So far we've generated 20 files named 1-20, and each containes data like the above each containting
its frequency log.

So for the final task I need a way to calculate the span of each talk period. But as you can see from
the output of the above sample, there is a gap in time when the system was off between 17:40:55 and 17:41:01.
In MathLab I was able to generate the folowing by subtracting each time from the time before it to come up with a
1 or a 0. The 1 was for periods where the time - time before = 1 and the 0 was for time - time before = >1.
The I added all the 1's until I hit a 0 and I started over on the next line.

My output looked like this:

Code:

3
7

So what fancy awk or sed command do you have for me now that can do this? Tink if you can do this I'm donating at least 50$

Tinkster · 05-26-2005, 11:00 PM

Is the discriminating feature the time, or could one
safely assume that the difference in the power is an
indicator for the change as well? Just looking for an
optimum approach to the problem ;)

Cheers,
Tink

Tinkster · 05-27-2005, 05:39 AM

Ooooh kay :)

On the last bit of data the following awk script (a bit
more complex than the plain splitting) gives the desired
result...

Code:

#!/usr/bin/awk -f
function tim_secs(string){
  split( string, secs , ":")
    return ( secs[1]*3600 + secs[2]*60 + secs[3])
}
BEGIN{
  first=0
  new=1
}
{
  one=tim_secs( $2)
  if(first!=0){
    if((one - two)==1){
      a[new]+=1
    }else{
      new+=1
    }
  }
  first=1
  two=one
}
END{
  for (i in a) print a[i]+1
}

Just save it to some file, chmod u+x it, and run it like so:

awkfile data.txt

For a set of e.g. 20 files an invocaltion like

Code:

shopt -s extglob;for i in `ls -1 +([0-9])`;do  echo $i:;  awkfile $i;  echo;done; shopt -u extglob

should output the seconds for each file...

Cheers,
Tink

oracle11112 · 05-27-2005, 12:27 PM

As usual, TINKSTER, you are brilliant. That worked better than I could have possibly imagined. It generates output that I > to seperate files for each of the 20 frequencies. Each file contains a number list of each of the call durations. I know I said that that was the final task but I just found out that the Lead Engineer needs another file with the total of seconds from each file.

I would assume that I could just add all the lines in the file. But I don't know how. The files contain only numbers like

10
16
1
5
18
44

and so on...

Once I have that information, I'll know the total usage time for the system. I'm sure I'll have to calculate somthing else as well, but for now that's what I've been told.

Tinkster · 05-27-2005, 04:06 PM

Thinking too complicated here :}

Since all the lines represent one second you just need to add a wc -l
in the loop ;}

Code:

rm totals; shopt -s extglob;for i in `ls -1 +([0-9])`;do wc -l $i >> totals; echo $i:;  awkfile $i;  echo; done; shopt -u extglob

Cheers,
Tink

oracle11112 · 05-28-2005, 11:20 AM

Yes I was thinking way to complicated, I got it to work with a expr function loop but you're way works way better and way shorter thanks.

Tinkster · 05-28-2005, 07:00 PM

One is glad to be of service :D

Cheers,
Tink