LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-08-2011, 02:58 AM   #1
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 421
Blog Entries: 28

Rep: Reputation: 2
How would I use awk or sed to match this?


Hi, I am quite new, not sure how awk and sed can be used to preg match strings? how would I go about matching a string like:
Code:
Content-Disposition: attachment; filename=
	"=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?=
 =?big5?Q?orrent?="
Content-Disposition: attachment; filename="=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?=
 =?big5?Q?orrent?="
anything within the quotation mark beside the "Content-Disposition: attachment; filename=" string?
Thanks,
Ted
 
Old 04-08-2011, 03:28 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
Code:
sed -n '/^Content/s/[^"]+"|"$//p' file
Something like that ... might need '-r' as well.
 
Old 04-08-2011, 03:57 AM   #3
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 421
Blog Entries: 28

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by grail View Post
Code:
sed -n '/^Content/s/[^"]+"|"$//p' file
Something like that ... might need '-r' as well.
Thanks for your help, but I can't get that to work either. I've attempted to modify it, does this look more understandable:
Code:
echo $(cat "/var/mail/root/msg.9CT" | sed -r '/^Content-Disposition: attachment; filename=(*.?)/\1/p')

sh-3.1# /mnt/sda1/test.sh
sed: -e expression #1, char 50: Invalid preceding regular expression
Content-Disposition: attachment; filename= "=?big5?B?W8Opw6m2faTWl13i50?==?big5?Q?orrent?="
Green is the constant portion. Red is variable portion and the space inbetween the 'filename=' and the quotation mark '"' can be either a single space, a enter to a new line or several tabs?
Thanks
Ted

Last edited by ted_chou12; 04-08-2011 at 04:03 AM.
 
Old 04-08-2011, 04:35 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
Sorry .. my bad I just thought the line endings had been messed up on the paste
btw. this is the correct sed should it all be on one line:
Code:
sed -r -n '/^Content/s/^[^"]*"|"$//gp' file
So now that I know it is over multiple lines ... how do you want the data returned? ie. do you still want it over multiple lines or joined into a single entry?
 
1 members found this post helpful.
Old 04-08-2011, 04:36 AM   #5
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655
Quote:
sed -r '/^Content-Disposition: attachment; filename=(*.?)/\1/p')
The first part: '/^Content-Disposition: attachment; filename=(*.?)/
matches a line. You're `s' command is missing a left hand side.
The "*.?" pattern doesn't make sense.
To match characters inside quotes, you could use:
"\([^"]*\)" or ".*"

However, your sample had the contents inside the quotes spread across 2 or 3 lines. Was that a mistake in copying to this post, or does it represent a real sample? Sed is a line editor. The input the LHS matches is in a line of input. If the input is in 2 or 3 lines, you need to build up more lines using the command "N" or "H".

Code:
sed -n '/Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/s/\n//g;p}' test
Content-Disposition: attachment; filename=      "=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?="
Content-Disposition: attachment; filename="=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?="
Here I am building up the input pattern by up to 3 lines if both quotes are not present in the the line. Then I removed the line feeds, joining the line.

If you just want the contents between the quotes you could use:
s/\n//g;s/.*\(".*"\).*/\1/p
in it's place.

This prints the contents without the quotes:
Code:
sed -n '/Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/s/\n//g;s/.*"\(.*\)"/\1/p}' test
=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?=
=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?=
Your sample contains the same contents in both samples. Is that what you wanted?

----

Knowing more information about the input pattern can help. For example, if you have records of lines separated by a blank line, things could be a lot easier.
eg:
Code:
/sbin/lspci -v | sed -n '/Network/,/^$/p'
14:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. Device e009
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at f2100000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: ath9k
This allows matching a pattern for a range of lines, which could be operated on inside of brackets.

Last edited by jschiwal; 04-08-2011 at 04:44 AM.
 
1 members found this post helpful.
Old 04-08-2011, 04:54 AM   #6
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 421
Blog Entries: 28

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by jschiwal View Post
The first part: '/^Content-Disposition: attachment; filename=(*.?)/
matches a line. You're `s' command is missing a left hand side.
The "*.?" pattern doesn't make sense.
To match characters inside quotes, you could use:
"\([^"]*\)" or ".*"

However, your sample had the contents inside the quotes spread across 2 or 3 lines. Was that a mistake in copying to this post, or does it represent a real sample? Sed is a line editor. The input the LHS matches is in a line of input. If the input is in 2 or 3 lines, you need to build up more lines using the command "N" or "H".

Code:
sed -n '/Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/s/\n//g;p}' test
Content-Disposition: attachment; filename=      "=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?="
Content-Disposition: attachment; filename="=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?="
Here I am building up the input pattern by up to 3 lines if both quotes are not present in the the line. Then I removed the line feeds, joining the line.

If you just want the contents between the quotes you could use:
s/\n//g;s/.*\(".*"\).*/\1/p
in it's place.

This prints the contents without the quotes:
Code:
sed -n '/Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/!N}
        /Content-Disposition/{ /".*"/s/\n//g;s/.*"\(.*\)"/\1/p}' test
=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?=
=?big5?B?W8Opw6m2faTfpHDC7V9ieV9qYWNreWtvMjAwMl13aGl0ZSBhbGJ1bSAxNC0yNi50?= =?big5?Q?orrent?=
Your sample contains the same contents in both samples. Is that what you wanted?

----

Knowing more information about the input pattern can help. For example, if you have records of lines separated by a blank line, things could be a lot easier.
eg:
Code:
/sbin/lspci -v | sed -n '/Network/,/^$/p'
14:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. Device e009
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at f2100000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: ath9k
This allows matching a pattern for a range of lines, which could be operated on inside of brackets.
Thank you! jschiwal This was a very detailed explanation! Thanks to grail too.
Ted
 
Old 04-08-2011, 05:00 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,655

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
How about:
Code:
awk 'BEGIN{RS="[ \t\n]*\"[ \t\n]*"}/^Content/{getline;print}' file
Edit: Also you can put all the output on one line for each one like so:
Code:
awk 'BEGIN{RS="[ \t\n]*\"[ \t\n]*"}/^Content/{getline;gsub(/[[:space:]]/,"");print}' file

Last edited by grail; 04-08-2011 at 05:05 AM.
 
1 members found this post helpful.
Old 04-08-2011, 05:27 AM   #8
ted_chou12
Member
 
Registered: Aug 2010
Location: Zhongli, Taoyuan
Distribution: slackware, windows, debian (armv4l GNU/Linux)
Posts: 421
Blog Entries: 28

Original Poster
Rep: Reputation: 2
Thanks grail! This works awesome!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What should some Regex match in awk? sebelk Programming 7 11-20-2009 07:38 PM
simple pattern match with awk, sed alenD Linux - Newbie 10 03-10-2008 03:31 PM
RE in commands like match() inside awk. stalin.varanasi Linux - Newbie 2 12-13-2007 12:31 AM
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 12:39 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 09:55 AM


All times are GMT -5. The time now is 09:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration