Huge Data Set Analysis, Shell Script to copy specific HEX Pairs into a separate file
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Fedora on the desk / Gentoo in the Racks
Posts: 36
Rep:
Huge Data Set Analysis, Shell Script to copy specific HEX Pairs into a separate file
I've got a huge data set that I'm working with, the data is dumped from another script into a file in real time and consists of 16 Groupings of 2 Hex Characters times 10 rows, then the sequence repeats with different values. This happens over and over again.. "Multi-GB a day."
Here's an example of what I'm looking at.
Code:
5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 00 00 00 00 55 aa d1 9c 3c 04
5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 00 00 00 00 55
5f 30 59 ed ea e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 5f 30 59 ed ea
00 00 00 00 55 e1 8c 61 57 d7 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 48 be 25 65 15 a3 7d ae c6 c8
de 05 df b2 b3 71 aa d1 9c 3c 04 5f 30 59 ed ea
e1 8c 61 57 d7 00 00 00 00 55 36 5b e6 40 90 8c
a1 16 a2 78 7b fe 00 be 25 65 15 a3 7d ae 9a 00
00 00 00 00 00 00 48 9d c6 c8 de 05 df b2 b3 71
aa d1 9c 3c 04 5f 30 59 ed ea e1 8c 61 57 d7 36
5b e6 40 90 00 00 00 00 55 8c a1 16 a2 78 7b fe
48 be 25 65 15 a3 7d ae c6 c8 de 05 df b2 b3 71
aa d1 9c 3c 04 5f 30 59 ed ea e1 8c 61 57 d7 36
5b e6 40 90 8c a1 16 a2 78 00 00 00 00 55 7b fe
What I'm looking for in this case, is to copy the Hex Values that occur at 7th column x 5th row of each of the repetitions into a new file.
So if I break it down I know the following:
- Each grouping is 10 rows long
- Each grouping is 16 columns of HEX pairs wide + spaces
- There is one blank line between every 10 rows
- The specific HEX Pair I need to output is row 5 x column 7
- I need to copy the HEX pair out to a new file "so cat to a file (>)"
- I need the output file to be historic, so in this case I don't want to overwrite it. (>>)
So what I can infer:
- I need to use something like the cut command to get the right column
- I need another command to grab the correct line the first time
- I need some sort of a loop that can increment the line grab by 11 each time it loops.
Distribution: Fedora on the desk / Gentoo in the Racks
Posts: 36
Original Poster
Rep:
Quote:
Originally Posted by Mr. C.
Code:
$ cat hex.pl
#!/usr/bin/perl
$row=1;
while (<>) {
chomp;
if (/^$/) {
$row = 1;
next;
}
print "$1\n" if $row == 5 and /^(?:[[:xdigit:]]{2} ){6}([[:xdigit:]]{2})/;
$row++;
}
$ hex.pl hex
48
48
Pipe your output to hex.pl, redirect as you see fit.
If I'm reading this correctly, "and I'm not familiar with perl so bare with me" the script says:
Sets a starting point of row 1
counts down to row 5
counts over to the correct grouping
cuts that grouping
then starts the entire process over again based on the next blank line
If you look at the data, there seem to be repeating patterns that are not in sync with the 10-line formatting. Are you sure which data you need to extract?
If I'm reading this correctly, "and I'm not familiar with perl so bare with me" the script says:
Sets a starting point of row 1
counts down to row 5
counts over to the correct grouping
cuts that grouping
then starts the entire process over again based on the next blank line
Does that sound about right?
Yes, it counts to row 5, and matches hex digit 6 space-separated "nibbles", followed by a 7th, which it the pattern captures as $1. It then continues doing nothing until a blank line is seen.
Mr. C.. Thank you again for your help, however in this case I think I'm going to go with NX5000's script... it's elegant in it's counting method and much easier for me to wrap my head around.
Both methods do work though so thank you both for a quick solution.
It is indeed simple, and there are many ways to perform a task.
I chose what I thought might be more self-explanatory to you, where you could change the row and column easily. I personally also think some input validation is worthwhile, as well is being able to ignore garbage that might follow after the 16th col. But too each his own.
Bash scripting is not the "power tool" that you need to do this job. You're wasting your time to use a tool that is too-weak for the purpose that you intend.
Perlis a "power tool" that is very well suited to this purpose. (It's not the only such tool in the Linux world, to be sure, but it happens to be a damn-good one.)
Ergo, faced with this problem, invest in a short interlude to become cursorily-familiar with Perl ... and do it just as quickly as you can.
One of the hallmarks of Linux/Unix environments ... "what all the fuss is really about," if you will ... is that there is a cornucopia of power-tools available here, and you can "write a script" in any one (or many!) of them.
In such an environment, therefore, it is very easy to make the rueful discovery that ... you're making things much harder on yourself than you actually needed to, a-n-d, you didn't even know it.
Hey... "no harm, no foul!" If, say, "you cut your teeth on Microsoft Windows," where tools other than Visual Basic (heh...) are essentially non-existent, then you might be entirely unaccustomed to find yourself in the "embarrassment of riches" that is Unix/Linux. No problem... I am not making fun at your expense. Welcome aboard! Now you know "what all the fuss is about!"
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.