LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sed tool to extract certain tags/fields from FIX messages (https://www.linuxquestions.org/questions/linux-newbie-8/sed-tool-to-extract-certain-tags-fields-from-fix-messages-4175597839/)

amateurscripter 01-19-2017 03:52 PM

sed tool to extract certain tags/fields from FIX messages
 
Hello everyone,

I have another question. Maybe some of you are familiar with FIX protocol and FIX messages. Here are some samples messages where I want to extract only certain tags and their values for (tags) 32, 55, and 31. These tags are not always in same order or in the same position.

When I ran this command, I got pretty much all of the lines as they are:

cat testlog.txt | sed 's/\(32=[^|]*\)|\(55=[^|]*\)|\(31=[^|]*\)|/\1 \2 \3/g'


Here's what the data set looks like:
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.674|34=87|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1133|6=22.44|198=6801199340212833364|17=011620 320001227|151=567|30=XASX|75=20170117|63=6|64=20170119|32=778|31=22.44|55=TBJ|11=clordid001007|10=01 4|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.683|34=88|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1269|6=22.44|198=6801199340212833364|17=011620 320001228|151=431|30=XASX|75=20170117|63=6|64=20170119|32=136|31=22.44|11=clordid001007|10=004|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=89|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1491|6=22.44|198=6801199340212833364|17=011620 320001229|151=209|30=XASX|75=20170117|63=6|64=20170119|31=22.44|55=TBJ|32=222|11=clordid001007|10=00 3|
OUT: 8=FIX.4.2|9=425|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=90|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=2|39=2|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1700|6=22.44|198=6801199340212833364|17=011620 320001230|151=0|30=XASX|75=20170117|63=6|64=20170119|32=209|31=22.44|11=clordid001007|10=134|


I'm expecting to see something like this from the first line:

32=778 55=TBJ 31=22.44

You'll notice the tags are in different order on the other lines.

Is this doable with sed or egrep or awk?

I tried egrep too using the above logic but just for tags 32 and 55:

$ cat testlog.txt |egrep -o '(32=[^|]*| 55=[^|]*)' |paste -d" |" - /dev/null -
32=778 |32=136
32=222 |32=209
$

As you can see I got two lines where it should have been 4 lines and tag 55 is missing.

Thanks

grail 01-19-2017 08:20 PM

Code:

awk  '/(31|32|55)=/{x++;ORS = x%3?" | ":"\n";print}' RS="|" testlog.txt

amateurscripter 01-20-2017 11:16 AM

Thx Grail, appreciate your suggestion. Is this not possible with "sed"?

Honestly, I don't know what that awk statement says. Can you break it down pls?

grail 01-20-2017 12:20 PM

Possible with sed ... anything is possible :) Right tool for the job or the one you are best at is often better than trying hard to use a different tool.

Here is the awk manual page. Bookmark it as it is the source of all awk truth :)

Breakdown:

Code:

RS="|" :- Set the Record Separator (RS) to be a pipe

/(31|32|55)=/{} :- Only look for records containing one of these numbers, followed by an equals sign.  If so, execute all instruction inside braces

x++ :- Increase counter variable 'x' by 1 (defaults to zero)

ORS = x%3?" | ":"\n" :- Set the Output Record Separator (ORS) based on the ternary operation.  Ternary (?:) is shorthand for if/then/else construct.  Everything prior to ? is the expression to be evaluated. 
                        In this case, x modulo 3.  Expression will output either a zero (false) or a number [1 or 2] (true).  When true, set the ORS to the string " | " and when false set to newline (\n)
                        So essentially it says, For all but the third record use " | " and for the third record use "\n" when printing records

print :- Print the current record, using ORS based on previous as line ending.


amateurscripter 01-20-2017 12:25 PM

Thx very much Grail. I already bookmarked it.

grail 01-20-2017 02:00 PM

No probs ;) Is you have a solution that works and no more questions, please remember to mark as SOLVED :D

amateurscripter 01-23-2017 10:31 AM

Thx again Grail. However, still wondering if same task can be accomplished using "sed"? Do you or anybody else do the same thing grail did above using the sed tool? If not I'm going to mark it solved.

Thanks all.

pan64 01-23-2017 10:36 AM

in your egrep (first post) there is a space before 55.

grail 01-23-2017 11:07 AM

I could be wrong, but I do not think it can be done with sed (easily) as you do not have a constant order. So if you could get it to work you would need to list all possible combinations, which
is currently 6 for 3 numbers but jumps to 24 for 4 numbers so it would become an unwieldy solution the large the number space. Whereas the awk only requires you add the new number and increase
the count accordingly.

amateurscripter 01-23-2017 11:57 AM

Noted, I'll need to really get myself familiarized with awk more. It probably depends on the situation but so far, to me, maybe awk might be more useful.

pan64, thank you to you as well. I didn't realize that made a difference. And if I add another item to the search list, just found out that I also need to add another "- /dev/null - at the end, otherwise the output only puts two items at a line where an item from the first line rolls over to the next line, like this:

$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null -
32=778 |31=22.44
55=TBJ |55=TBJ
32=136 |31=22.44
31=22.44 |55=TBJ
32=222 |55=TBJ
32=209 |31=22.44
$

It should really look like this:

$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
32=778 |31=22.44 |55=TBJ
55=TBJ |32=136 |31=22.44
31=22.44 |55=TBJ |32=222
55=TBJ |32=209 |31=22.44
$

grail 01-23-2017 09:32 PM

Please use [code][/code] tags when displaying code / data :)

allend 01-24-2017 06:38 AM

Quote:

$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
I do not like the use of 'paste' here (or the UUOC).
Code:

egrep -o '(31=[^|]*|32=[^|]*|55=[^|]*)' testlog.txt | sed 'N;N;s/\n/|/g'
For readability, I would also prefer
Code:

grep -o -e '31=[^|]*' -e '32=[^|]*' -e '55=[^|]*' testlog.txt | sed 'N;N;s/\n/|/g'
Of course, the 'awk' solution is still the most elegant.


All times are GMT -5. The time now is 06:02 PM.