LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-19-2017, 03:52 PM   #1
amateurscripter
Member
 
Registered: Nov 2011
Posts: 30

Rep: Reputation: Disabled
Red face sed tool to extract certain tags/fields from FIX messages


Hello everyone,

I have another question. Maybe some of you are familiar with FIX protocol and FIX messages. Here are some samples messages where I want to extract only certain tags and their values for (tags) 32, 55, and 31. These tags are not always in same order or in the same position.

When I ran this command, I got pretty much all of the lines as they are:

cat testlog.txt | sed 's/\(32=[^|]*\)|\(55=[^|]*\)|\(31=[^|]*\)|/\1 \2 \3/g'


Here's what the data set looks like:
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.674|34=87|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1133|6=22.44|198=6801199340212833364|17=011620 320001227|151=567|30=XASX|75=20170117|63=6|64=20170119|32=778|31=22.44|55=TBJ|11=clordid001007|10=01 4|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.683|34=88|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1269|6=22.44|198=6801199340212833364|17=011620 320001228|151=431|30=XASX|75=20170117|63=6|64=20170119|32=136|31=22.44|11=clordid001007|10=004|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=89|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1491|6=22.44|198=6801199340212833364|17=011620 320001229|151=209|30=XASX|75=20170117|63=6|64=20170119|31=22.44|55=TBJ|32=222|11=clordid001007|10=00 3|
OUT: 8=FIX.4.2|9=425|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=90|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=2|39=2|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1700|6=22.44|198=6801199340212833364|17=011620 320001230|151=0|30=XASX|75=20170117|63=6|64=20170119|32=209|31=22.44|11=clordid001007|10=134|


I'm expecting to see something like this from the first line:

32=778 55=TBJ 31=22.44

You'll notice the tags are in different order on the other lines.

Is this doable with sed or egrep or awk?

I tried egrep too using the above logic but just for tags 32 and 55:

$ cat testlog.txt |egrep -o '(32=[^|]*| 55=[^|]*)' |paste -d" |" - /dev/null -
32=778 |32=136
32=222 |32=209
$

As you can see I got two lines where it should have been 4 lines and tag 55 is missing.

Thanks
 
Old 01-19-2017, 08:20 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Code:
awk  '/(31|32|55)=/{x++;ORS = x%3?" | ":"\n";print}' RS="|" testlog.txt
 
1 members found this post helpful.
Old 01-20-2017, 11:16 AM   #3
amateurscripter
Member
 
Registered: Nov 2011
Posts: 30

Original Poster
Rep: Reputation: Disabled
Thx Grail, appreciate your suggestion. Is this not possible with "sed"?

Honestly, I don't know what that awk statement says. Can you break it down pls?
 
Old 01-20-2017, 12:20 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Possible with sed ... anything is possible Right tool for the job or the one you are best at is often better than trying hard to use a different tool.

Here is the awk manual page. Bookmark it as it is the source of all awk truth

Breakdown:

Code:
RS="|" :- Set the Record Separator (RS) to be a pipe

/(31|32|55)=/{} :- Only look for records containing one of these numbers, followed by an equals sign.  If so, execute all instruction inside braces

x++ :- Increase counter variable 'x' by 1 (defaults to zero)

ORS = x%3?" | ":"\n" :- Set the Output Record Separator (ORS) based on the ternary operation.  Ternary (?:) is shorthand for if/then/else construct.  Everything prior to ? is the expression to be evaluated.  
                        In this case, x modulo 3.  Expression will output either a zero (false) or a number [1 or 2] (true).  When true, set the ORS to the string " | " and when false set to newline (\n)
                        So essentially it says, For all but the third record use " | " and for the third record use "\n" when printing records

print :- Print the current record, using ORS based on previous as line ending.
 
1 members found this post helpful.
Old 01-20-2017, 12:25 PM   #5
amateurscripter
Member
 
Registered: Nov 2011
Posts: 30

Original Poster
Rep: Reputation: Disabled
Thx very much Grail. I already bookmarked it.
 
Old 01-20-2017, 02:00 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
No probs Is you have a solution that works and no more questions, please remember to mark as SOLVED
 
Old 01-23-2017, 10:31 AM   #7
amateurscripter
Member
 
Registered: Nov 2011
Posts: 30

Original Poster
Rep: Reputation: Disabled
Thx again Grail. However, still wondering if same task can be accomplished using "sed"? Do you or anybody else do the same thing grail did above using the sed tool? If not I'm going to mark it solved.

Thanks all.
 
Old 01-23-2017, 10:36 AM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 13,070

Rep: Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132Reputation: 4132
in your egrep (first post) there is a space before 55.
 
1 members found this post helpful.
Old 01-23-2017, 11:07 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
I could be wrong, but I do not think it can be done with sed (easily) as you do not have a constant order. So if you could get it to work you would need to list all possible combinations, which
is currently 6 for 3 numbers but jumps to 24 for 4 numbers so it would become an unwieldy solution the large the number space. Whereas the awk only requires you add the new number and increase
the count accordingly.
 
1 members found this post helpful.
Old 01-23-2017, 11:57 AM   #10
amateurscripter
Member
 
Registered: Nov 2011
Posts: 30

Original Poster
Rep: Reputation: Disabled
Noted, I'll need to really get myself familiarized with awk more. It probably depends on the situation but so far, to me, maybe awk might be more useful.

pan64, thank you to you as well. I didn't realize that made a difference. And if I add another item to the search list, just found out that I also need to add another "- /dev/null - at the end, otherwise the output only puts two items at a line where an item from the first line rolls over to the next line, like this:

$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null -
32=778 |31=22.44
55=TBJ |55=TBJ
32=136 |31=22.44
31=22.44 |55=TBJ
32=222 |55=TBJ
32=209 |31=22.44
$

It should really look like this:

$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
32=778 |31=22.44 |55=TBJ
55=TBJ |32=136 |31=22.44
31=22.44 |55=TBJ |32=222
55=TBJ |32=209 |31=22.44
$
 
Old 01-23-2017, 09:32 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,719

Rep: Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034Reputation: 3034
Please use [code][/code] tags when displaying code / data
 
1 members found this post helpful.
Old 01-24-2017, 06:38 AM   #12
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,272

Rep: Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918Reputation: 1918
Quote:
$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
I do not like the use of 'paste' here (or the UUOC).
Code:
egrep -o '(31=[^|]*|32=[^|]*|55=[^|]*)' testlog.txt | sed 'N;N;s/\n/|/g'
For readability, I would also prefer
Code:
grep -o -e '31=[^|]*' -e '32=[^|]*' -e '55=[^|]*' testlog.txt | sed 'N;N;s/\n/|/g'
Of course, the 'awk' solution is still the most elegant.

Last edited by allend; 01-24-2017 at 06:39 AM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Have a sed file I'd like to extract fields from L_Carver Linux - Software 5 12-08-2016 01:25 PM
Python: Extract names and values from HTML tags Dogs Programming 2 02-10-2011 08:56 AM
Extract Data between XML tags aharrison Linux - Newbie 13 11-17-2010 07:28 PM
[SOLVED] Will gawk extract bits of text fields from a few thousand identically structured file taskmaster Linux - Software 4 11-10-2010 08:46 PM
Script to extract the fields in the agiml tags akhtar.bhat Linux - Software 1 12-17-2008 06:13 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration