LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-27-2011, 10:52 AM   #1
swissmac
LQ Newbie
 
Registered: Oct 2011
Posts: 9

Rep: Reputation: Disabled
awk extract different parts per line


Hi

I've tried to extract from this lines

Code:
2011-06-26 23:59:56.746#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#advId=103613446#lang=de
2011-06-26 23:59:56.888#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#a=default#advId=103659208
2011-06-26 23:59:57.202#11.203.11.174#gGwcTHrdwrxKPS56v7TLxwnN0HsKSpHmGvJc1Vw1t7NyBJJMBvFw#advId=103562066#lang=fr
2011-06-26 23:59:57.908#11.20.11.174#dw4TTHrdQ2M5Y8ypkSvPnFVjVQpKLhJfGQpVD7NyScJPsKqvtWR1#advId=103661409#lang=de
2011-06-26 23:59:57.950#11.203.11.174#WtDDTHrdmmP4SGB2c6d06qXlYf41cTXk0Q2p4VBL5nhDvzjT5NpK#a=default#advId=103613809
2011-06-26 23:59:56.745#111.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#advId=103613446#lang=de
2011-06-26 23:59:58.141#111.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#a=default#advId=103659208
2011-06-26 23:59:58.270#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#a=default#advId=103655548
2011-06-26 23:59:58.549#11.21.11.174#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#a=default#advId=103613004
2011-06-26 23:59:59.251#125.3.11.174#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#advId=103660045#lang=de
2011-06-26 23:59:59.686#11.23.11.4#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#a=default#advId=103655548
the part "date", IP-adress, Session-ID and the number of the field "advId=" which can be anywhere in the line after the session-ID.
Result should look like this

Code:
2011-06-26 23:59:59.686#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
Any help would be appreciated.
Kind regards.

---------------

Thanks to remind me the sample record was wrong, I corrected the last line.

I can guarantee the string up to the Session-ID like this

Code:
2011-06-26 23:59:59.686#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#
2011-06-26 23:59:58.549#11.21.11.174#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#
2011-06-26 23:59:59.251#125.3.11.174#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#
2011-06-26 23:59:59.686#11.23.11.4#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#

Last edited by swissmac; 10-28-2011 at 02:54 AM.
 
Old 10-27-2011, 11:07 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well I would say the first column returned is an unusual looking date??

What field positions can you guarantee?
 
Old 10-28-2011, 01:21 AM   #3
swissmac
LQ Newbie
 
Registered: Oct 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
Thanks to remind me the sample record was wrong, I corrected the last line.

I can guarantee the string up to the Session-ID like this

Code:
2011-06-26 23:59:59.686#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#
2011-06-26 23:59:58.549#11.21.11.174#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#
2011-06-26 23:59:59.251#125.3.11.174#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#
2011-06-26 23:59:59.686#11.23.11.4#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#
ist allways the same.

Last edited by swissmac; 10-28-2011 at 02:54 AM.
 
Old 10-28-2011, 01:55 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Could you please enclose the text in [code][/code] tags, to preserve formatting and to keep the screen from side-scrolling? Thanks.
 
1 members found this post helpful.
Old 10-28-2011, 02:37 AM   #5
swissmac
LQ Newbie
 
Registered: Oct 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
Could you please enclose the text in [code][/code] tags, to preserve formatting and to keep the screen from side-scrolling? Thanks.
Sorry, I just change it.
 
Old 10-28-2011, 03:02 AM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Code:
awk -F"#" '{adv=gensub(/.*#advId=([^#]+).*/,"\\1",1,$0); print $1"#"$2"#"$3"#"adv }' swissmac
2011-06-26 23:59:56.746#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:56.888#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:57.202#11.203.11.174#gGwcTHrdwrxKPS56v7TLxwnN0HsKSpHmGvJc1Vw1t7NyBJJMBvFw#103562066
2011-06-26 23:59:57.908#11.203.11.174#dw4TTHrdQ2M5Y8ypkSvPnFVjVQpKLhJfGQpVD7NyScJPsKqvtWR1#103661409
2011-06-26 23:59:57.950#11.203.11.174#WtDDTHrdmmP4SGB2c6d06qXlYf41cTXk0Q2p4VBL5nhDvzjT5NpK#103613809
2011-06-26 23:59:56.745#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:58.141#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:58.270#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
2011-06-26 23:59:58.549#11.203.11.174#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#103613004
2011-06-26 23:59:59.251#11.203.11.174#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#103660045
2011-06-26 23:59:59.686#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548


Cheers,
Tink
 
1 members found this post helpful.
Old 10-28-2011, 03:03 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Extracting fields with awk is a trivial task:
Code:
BEGIN {
  FS = "#"
}

{
  printf "%s#%s#%s#", $1, $2, $3   
  for ( i = 4; i <= NF; i++ )     
    if ( $i ~ /advId=/ ) {     
      sub(/advId=/,"",$i)
      print $i     
    }       
}
Edit: beaten by Tinkster with a more concise solution..

Last edited by colucix; 10-28-2011 at 03:04 AM.
 
1 members found this post helpful.
Old 10-28-2011, 03:18 AM   #8
swissmac
LQ Newbie
 
Registered: Oct 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
Many thanks to both of you! great solutions.

One more thing, after I w'll have to sort the fields in the following order

adv-id, session-id, ip-address, timestamp

Code:
timestamp              #ip-address    #session-id                                          #adv-id
2011-06-26 23:59:56.746#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:56.888#11.03.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:57.202#11.203.11.174#gGwcTHrdwrxKPS56v7TLxwnN0HsKSpHmGvJc1Vw1t7NyBJJMBvFw#103562066
2011-06-26 23:59:57.908#11.203.11.174#dw4TTHrdQ2M5Y8ypkSvPnFVjVQpKLhJfGQpVD7NyScJPsKqvtWR1#103661409
2011-06-26 23:59:57.950#11.203.11.174#WtDDTHrdmmP4SGB2c6d06qXlYf41cTXk0Q2p4VBL5nhDvzjT5NpK#103613809
2011-06-26 23:59:56.745#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:58.141#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:58.270#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
2011-06-26 23:59:58.549#11.203.11.4#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#103613004
2011-06-26 23:59:59.251#11.263.11.26#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#103660045
2011-06-26 23:59:59.686#11.203.11.122#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
Thanks for your appreciated help!

Last edited by swissmac; 10-28-2011 at 04:04 AM. Reason: changed the length of ip-address
 
Old 10-28-2011, 03:21 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well you could probably use awk with a field separator of # and then loop through the other fields till you find the one you need and get the necessary data.
 
1 members found this post helpful.
Old 10-28-2011, 12:53 PM   #10
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by swissmac View Post
Many thanks to both of you! great solutions.

One more thing, after I w'll have to sort the fields in the following order

adv-id, session-id, ip-address, timestamp

Code:
timestamp              #ip-address    #session-id                                          #adv-id
2011-06-26 23:59:56.746#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:56.888#11.03.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:57.202#11.203.11.174#gGwcTHrdwrxKPS56v7TLxwnN0HsKSpHmGvJc1Vw1t7NyBJJMBvFw#103562066
2011-06-26 23:59:57.908#11.203.11.174#dw4TTHrdQ2M5Y8ypkSvPnFVjVQpKLhJfGQpVD7NyScJPsKqvtWR1#103661409
2011-06-26 23:59:57.950#11.203.11.174#WtDDTHrdmmP4SGB2c6d06qXlYf41cTXk0Q2p4VBL5nhDvzjT5NpK#103613809
2011-06-26 23:59:56.745#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:58.141#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:58.270#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
2011-06-26 23:59:58.549#11.203.11.4#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#103613004
2011-06-26 23:59:59.251#11.263.11.26#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#103660045
2011-06-26 23:59:59.686#11.203.11.122#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
Thanks for your appreciated help!
For the current working set this would do:
Code:
awk -F"#" '{adv=gensub(/.*#advId=([^#]+).*/,"\\1",1,$0); print $1"#"$2"#"$3"#"adv }' swissmac  | sort -t# -k4,4n -k3,3 -k2,2 -k1,1
2011-06-26 23:59:57.202#11.203.11.174#gGwcTHrdwrxKPS56v7TLxwnN0HsKSpHmGvJc1Vw1t7NyBJJMBvFw#103562066
2011-06-26 23:59:58.549#11.203.11.174#gmHnTHrpBNQLWBq94rT0QH5LGXtJ9hGqvrGb3yN0drFdP9vc0Qgj#103613004
2011-06-26 23:59:56.745#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:56.746#11.203.11.174#9CshTHrcK1jvNCjbpX2kx1SK2SW2phsCm041N2yr4hSLFPJWPdM9#103613446
2011-06-26 23:59:57.950#11.203.11.174#WtDDTHrdmmP4SGB2c6d06qXlYf41cTXk0Q2p4VBL5nhDvzjT5NpK#103613809
2011-06-26 23:59:58.270#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
2011-06-26 23:59:59.686#11.203.11.174#wt8LTHpTQRv6MTwVLSG9WpNT7hLhChj3Kf1DxTMHR2bmTN4Jp1tM#103655548
2011-06-26 23:59:56.888#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:58.141#11.203.11.174#2QtTTHycfL1rcy1msP2S1NkbLsvrlpTthm6yKbmnswgLgLHjwbNp#103659208
2011-06-26 23:59:59.251#11.203.11.174#NqvFTHrfnXFtdYvT3sMyBG3wjhHnyGHJp4rpNBSRjQwzXn65jVhH#103660045
2011-06-26 23:59:57.908#11.203.11.174#dw4TTHrdQ2M5Y8ypkSvPnFVjVQpKLhJfGQpVD7NyScJPsKqvtWR1#103661409
Of course, the IP address will be a problem ;}
 
1 members found this post helpful.
Old 11-11-2011, 06:56 AM   #11
swissmac
LQ Newbie
 
Registered: Oct 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
You guys are great. Thanks a lot it worked perfect for me.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Awk to extract patterns till it hits blank line (in for loop) Tauro Linux - Newbie 5 07-21-2011 11:20 PM
Awk - substitution with parts of the matched regular expression used. c_moriarty Programming 4 04-30-2011 08:38 AM
Awk to extract phrase between two words on a line? grob115 Programming 12 05-26-2010 09:46 PM
Extract certain parts of a string Dynadrate Linux - Software 1 04-05-2009 10:35 PM
extract part of a line with sed or awk alirezan1 Linux - Newbie 2 10-01-2008 09:44 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration