Huge Data Set Analysis, Shell Script to copy specific HEX Pairs into a separate file

telecom_is_me · 06-29-2008, 12:44 AM

I've got a huge data set that I'm working with, the data is dumped from another script into a file in real time and consists of 16 Groupings of 2 Hex Characters times 10 rows, then the sequence repeats with different values. This happens over and over again.. "Multi-GB a day."

Here's an example of what I'm looking at.

Code:

5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 00 00 00 00 55 aa d1 9c 3c 04
5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 00 00 00 00 55
5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 5f 30 59 ed ea
00 00 00 00 55 e1 8c 61 57 d7 36 5b e6 40 90 8c

a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 5f 30 59 ed ea
e1 8c 61 57 d7 00 00 00 00 55 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 00 be 25 65 15 a3 7d ae 9a 00
00 00 00 00 00 00 48 9d c6 c8 de 05 df b2 b3 71
aa d1 9c 3c 04 5f 30 59 ed ea e1 8c 61 57 d7 36
5b e6 40 90 00 00 00 00 55 8c a1 16 a2 78 7b fe
48 be 25 65 15 a3 7d ae c6 c8 de 05 df b2 b3 71
aa d1 9c 3c 04 5f 30 59 ed ea e1 8c 61 57 d7 36
5b e6 40 90 8c a1 16 a2 78 00 00 00 00 55 7b fe

What I'm looking for in this case, is to copy the Hex Values that occur at 7th column x 5th row of each of the repetitions into a new file.

So if I break it down I know the following:

- Each grouping is 10 rows long
- Each grouping is 16 columns of HEX pairs wide + spaces
- There is one blank line between every 10 rows
- The specific HEX Pair I need to output is row 5 x column 7
- I need to copy the HEX pair out to a new file "so cat to a file (>)"
- I need the output file to be historic, so in this case I don't want to overwrite it. (>>)

So what I can infer:

- I need to use something like the cut command to get the right column
- I need another command to grab the correct line the first time
- I need some sort of a loop that can increment the line grab by 11 each time it loops.

---

Any suggestions?

Mr. C. · 06-29-2008, 01:13 AM

Code:

$ cat hex.pl
#!/usr/bin/perl

$row=1;
while (<>) {
    chomp;
    if (/^$/) {
        $row = 1;
        next;
    }
    print "$1\n" if $row == 5 and /^(?:[[:xdigit:]]{2} ){6}([[:xdigit:]]{2})/;
    $row++;
}

$ hex.pl hex
48
48

Pipe your output to hex.pl, redirect as you see fit.

telecom_is_me · 06-29-2008, 09:07 AM

Quote:

Originally Posted by Mr. C.

Code:

$ cat hex.pl
#!/usr/bin/perl

$row=1;
while (<>) {
    chomp;
    if (/^$/) {
        $row = 1;
        next;
    }
    print "$1\n" if $row == 5 and /^(?:[[:xdigit:]]{2} ){6}([[:xdigit:]]{2})/;
    $row++;
}

$ hex.pl hex
48
48

Pipe your output to hex.pl, redirect as you see fit.

If I'm reading this correctly, "and I'm not familiar with perl so bare with me" the script says:

Sets a starting point of row 1
counts down to row 5
counts over to the correct grouping
cuts that grouping
then starts the entire process over again based on the next blank line

Does that sound about right?

pixellany · 06-29-2008, 09:32 AM

If you look at the data, there seem to be repeating patterns that are not in sync with the 10-line formatting. Are you sure which data you need to extract?

Mr. C. · 06-29-2008, 11:33 AM

Quote:

Originally Posted by telecom_is_me

If I'm reading this correctly, "and I'm not familiar with perl so bare with me" the script says:

Sets a starting point of row 1
counts down to row 5
counts over to the correct grouping
cuts that grouping
then starts the entire process over again based on the next blank line

Does that sound about right?

Yes, it counts to row 5, and matches hex digit 6 space-separated "nibbles", followed by a 7th, which it the pattern captures as $1. It then continues doing nothing until a blank line is seen.

nx5000 · 06-29-2008, 12:37 PM

Or by using paragraph read:

Code:

#!/usr/bin/perl
$/ = '';
while (<>) {
	chomp;
	print substr($_ , 210,2)."\n" ;
}

Quote:

$ ./aa.pl aie
48
48

telecom_is_me · 06-29-2008, 09:05 PM

Quote:

Originally Posted by nx5000

Or by using paragraph read:

Code:

#!/usr/bin/perl
$/ = '';
while (<>) {
	chomp;
	print substr($_ , 210,2)."\n" ;
}

Mr. C.. Thank you again for your help, however in this case I think I'm going to go with NX5000's script... it's elegant in it's counting method and much easier for me to wrap my head around.

Both methods do work though so thank you both for a quick solution.

Mr. C. · 06-29-2008, 09:28 PM

It is indeed simple, and there are many ways to perform a task.

I chose what I thought might be more self-explanatory to you, where you could change the row and column easily. I personally also think some input validation is worthwhile, as well is being able to ignore garbage that might follow after the 16th col. But too each his own.

Cheers.

syg00 · 06-29-2008, 09:55 PM

Well formed data can pardon a multitude of sins.
What if the "blank line" does indeed contain whitespace ???.

Mr. C. · 06-29-2008, 10:00 PM

A trivial fix:

if (/^$/) {

if (/^\s*$/) {

sundialsvcs · 06-29-2008, 10:08 PM

The bottom-line of this thread so-far is that:

Bash scripting is not the "power tool" that you need to do this job. You're wasting your time to use a tool that is too-weak for the purpose that you intend.
Perl is a "power tool" that is very well suited to this purpose. (It's not the only such tool in the Linux world, to be sure, but it happens to be a damn-good one.)
Ergo, faced with this problem, invest in a short interlude to become cursorily-familiar with Perl ... and do it just as quickly as you can.

One of the hallmarks of Linux/Unix environments ... "what all the fuss is really about," if you will ... is that there is a cornucopia of power-tools available here, and you can "write a script" in any one (or many!) of them.

In such an environment, therefore, it is very easy to make the rueful discovery that ... you're making things much harder on yourself than you actually needed to, a-n-d, you didn't even know it.

Hey... "no harm, no foul!" If, say, "you cut your teeth on Microsoft Windows," where tools other than Visual Basic (heh...) are essentially non-existent, then you might be entirely unaccustomed to find yourself in the "embarrassment of riches" that is Unix/Linux. No problem... I am not making fun at your expense. Welcome aboard! Now you know "what all the fuss is about!"

syg00 · 06-29-2008, 10:48 PM

Quote:

Originally Posted by Mr. C.

A trivial fix:

Indeed - I always prefer this test (if applicable). Been caught by "corner cases" too often.