Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
01-19-2017, 03:52 PM
|
#1
|
Member
Registered: Nov 2011
Posts: 41
Rep:
|
sed tool to extract certain tags/fields from FIX messages
Hello everyone,
I have another question. Maybe some of you are familiar with FIX protocol and FIX messages. Here are some samples messages where I want to extract only certain tags and their values for (tags) 32, 55, and 31. These tags are not always in same order or in the same position.
When I ran this command, I got pretty much all of the lines as they are:
cat testlog.txt | sed 's/\(32=[^|]*\)|\(55=[^|]*\)|\(31=[^|]*\)|/\1 \2 \3/g'
Here's what the data set looks like:
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.674|34=87|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1133|6=22.44|198=6801199340212833364|17=011620 320001227|151=567|30=XASX|75=20170117|63=6|64=20170119|32=778|31=22.44|55=TBJ|11=clordid001007|10=01 4|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.683|34=88|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1269|6=22.44|198=6801199340212833364|17=011620 320001228|151=431|30=XASX|75=20170117|63=6|64=20170119|32=136|31=22.44|11=clordid001007|10=004|
OUT: 8=FIX.4.2|9=427|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=89|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=1|39=1|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1491|6=22.44|198=6801199340212833364|17=011620 320001229|151=209|30=XASX|75=20170117|63=6|64=20170119|31=22.44|55=TBJ|32=222|11=clordid001007|10=00 3|
OUT: 8=FIX.4.2|9=425|35=8|49=GBLCONN1|56=SENFX2|52=20170117-21:09:03.684|34=90|50=TURBO|57=BOND07|128=BOND07|129=kbbo|37=8817013001796|150=2|39=2|20=0|60=201701 17-21:09:01.029|40=1|54=2|38=1700|167=CS|55=TBJ|21=1|59=6|126=20170129-22:49:59.000|109=BOND07|22=4|48=JP000000GG8|15=JPC|14=1700|6=22.44|198=6801199340212833364|17=011620 320001230|151=0|30=XASX|75=20170117|63=6|64=20170119|32=209|31=22.44|11=clordid001007|10=134|
I'm expecting to see something like this from the first line:
32=778 55=TBJ 31=22.44
You'll notice the tags are in different order on the other lines.
Is this doable with sed or egrep or awk?
I tried egrep too using the above logic but just for tags 32 and 55:
$ cat testlog.txt |egrep -o '(32=[^|]*| 55=[^|]*)' |paste -d" |" - /dev/null -
32=778 |32=136
32=222 |32=209
$
As you can see I got two lines where it should have been 4 lines and tag 55 is missing.
Thanks
|
|
|
01-19-2017, 08:20 PM
|
#2
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,021
|
Code:
awk '/(31|32|55)=/{x++;ORS = x%3?" | ":"\n";print}' RS="|" testlog.txt
|
|
1 members found this post helpful.
|
01-20-2017, 11:16 AM
|
#3
|
Member
Registered: Nov 2011
Posts: 41
Original Poster
Rep:
|
Thx Grail, appreciate your suggestion. Is this not possible with "sed"?
Honestly, I don't know what that awk statement says. Can you break it down pls?
|
|
|
01-20-2017, 12:20 PM
|
#4
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,021
|
Possible with sed ... anything is possible Right tool for the job or the one you are best at is often better than trying hard to use a different tool.
Here is the awk manual page. Bookmark it as it is the source of all awk truth
Breakdown:
Code:
RS="|" :- Set the Record Separator (RS) to be a pipe
/(31|32|55)=/{} :- Only look for records containing one of these numbers, followed by an equals sign. If so, execute all instruction inside braces
x++ :- Increase counter variable 'x' by 1 (defaults to zero)
ORS = x%3?" | ":"\n" :- Set the Output Record Separator (ORS) based on the ternary operation. Ternary (?:) is shorthand for if/then/else construct. Everything prior to ? is the expression to be evaluated.
In this case, x modulo 3. Expression will output either a zero (false) or a number [1 or 2] (true). When true, set the ORS to the string " | " and when false set to newline (\n)
So essentially it says, For all but the third record use " | " and for the third record use "\n" when printing records
print :- Print the current record, using ORS based on previous as line ending.
|
|
1 members found this post helpful.
|
01-20-2017, 12:25 PM
|
#5
|
Member
Registered: Nov 2011
Posts: 41
Original Poster
Rep:
|
Thx very much Grail. I already bookmarked it.
|
|
|
01-20-2017, 02:00 PM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,021
|
No probs Is you have a solution that works and no more questions, please remember to mark as SOLVED
|
|
|
01-23-2017, 10:31 AM
|
#7
|
Member
Registered: Nov 2011
Posts: 41
Original Poster
Rep:
|
Thx again Grail. However, still wondering if same task can be accomplished using "sed"? Do you or anybody else do the same thing grail did above using the sed tool? If not I'm going to mark it solved.
Thanks all.
|
|
|
01-23-2017, 10:36 AM
|
#8
|
LQ Addict
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,799
|
in your egrep (first post) there is a space before 55.
|
|
1 members found this post helpful.
|
01-23-2017, 11:07 AM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,021
|
I could be wrong, but I do not think it can be done with sed (easily) as you do not have a constant order. So if you could get it to work you would need to list all possible combinations, which
is currently 6 for 3 numbers but jumps to 24 for 4 numbers so it would become an unwieldy solution the large the number space. Whereas the awk only requires you add the new number and increase
the count accordingly.
|
|
1 members found this post helpful.
|
01-23-2017, 11:57 AM
|
#10
|
Member
Registered: Nov 2011
Posts: 41
Original Poster
Rep:
|
Noted, I'll need to really get myself familiarized with awk more. It probably depends on the situation but so far, to me, maybe awk might be more useful.
pan64, thank you to you as well. I didn't realize that made a difference. And if I add another item to the search list, just found out that I also need to add another "- /dev/null - at the end, otherwise the output only puts two items at a line where an item from the first line rolls over to the next line, like this:
$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null -
32=778 |31=22.44
55=TBJ |55=TBJ
32=136 |31=22.44
31=22.44 |55=TBJ
32=222 |55=TBJ
32=209 |31=22.44
$
It should really look like this:
$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
32=778 |31=22.44 |55=TBJ
55=TBJ |32=136 |31=22.44
31=22.44 |55=TBJ |32=222
55=TBJ |32=209 |31=22.44
$
|
|
|
01-23-2017, 09:32 PM
|
#11
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,021
|
Please use [code][/code] tags when displaying code / data
|
|
1 members found this post helpful.
|
01-24-2017, 06:38 AM
|
#12
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,450
|
Quote:
$ cat testlog.txt |egrep -o '(32=[^|]*|55=[^|]*|31=[^|]*)' |paste -d" |" - /dev/null - /dev/null -
|
I do not like the use of 'paste' here (or the UUOC).
Code:
egrep -o '(31=[^|]*|32=[^|]*|55=[^|]*)' testlog.txt | sed 'N;N;s/\n/|/g'
For readability, I would also prefer
Code:
grep -o -e '31=[^|]*' -e '32=[^|]*' -e '55=[^|]*' testlog.txt | sed 'N;N;s/\n/|/g'
Of course, the 'awk' solution is still the most elegant.
Last edited by allend; 01-24-2017 at 06:39 AM.
|
|
1 members found this post helpful.
|
All times are GMT -5. The time now is 04:57 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|