LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices



Reply
 
Search this Thread
Old 04-17-2006, 04:32 AM   #1
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Rep: Reputation: 15
how to use sed to redirect only pattern match to file (not entire line)


Hi, i'm trying to figure out how to use sed and am having some problems. I have this file:
tcxmlmeldinger/innfraergo/TC-20060413121638410.xml

the whole xml file is on 1 line so i can't grep out the text i want (grrr). When i try to use the w option in sed:
Code:
sed -e '/Varsling_[0-9]+/w tmp' tcxmlmeldinger/innfraergo/TC-20060413121638410.xml
Even that writes the whole matching line (in this case, the entire file) to tmp. What i'm most interested in is just getting the regexp match (will look something like: Varsling_902394039023) so i can save it in a bash variable. I guess i could rephrase this whole thing:
how do i extract a substring from a line of text in a file using bash?

Last edited by nickleus; 04-17-2006 at 05:06 AM.
 
Old 04-17-2006, 05:06 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Getting part of a string with sed:

sed -n 's/.*\(Varsling_[0-9]*\).*/\1/p' infile

The search string is made up of the following 3 parts:

.* => everything in front of Varsling_[0-9]*
\(Varsling_[0-9]*\) => What we are looking for. The \( and \) are special. All in between can be represented as \1 in the replacement part.
.* => everything after Varsling_[0-9]*

The -n make sed suppress the normal output, the p on the end prints only the replacement pattern.

Hope this clears things up.
 
Old 04-17-2006, 05:22 AM   #3
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Original Poster
Rep: Reputation: 15
druuna, thank you thank you thank you! i had thought about the whole \1 thing, but wasn't sure how to write it syntactically correct. you da man! worked like a charm =)
 
Old 04-17-2006, 08:08 AM   #4
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Original Poster
Rep: Reputation: 15
I tried expanding your example to include multiple backward references in my script (sh) file, but it doesn't seem to work:
Code:
echo "TIMESTAMP from SENT file $i: $(sed -n 's/.*DateAndTimes id="206">\s*<Year>\([0-9]*\)<\/Year>\s*<Month>\([0-9]*\)<\/Month>\s*<Day>\([0-9]*\)<\/Day>\s*<Hour>\([0-9]*\)<\/Hour>\s*<Minute>\([0-9]*\)<\/Minute>.*/\1\2\3\4\5/p' $SENT$i)"
Nothing gets printed out, but i can't see why when the file contents look like this:
Code:
<DateAndTimes id="206">
        <Year>2006</Year>
        <Month>04</Month>
        <Day>13</Day>
        <Hour>11</Hour>
        <Minute>57</Minute>
</DateAndTimes>
it should match, or have i just written a crappy regex? =)

Last edited by nickleus; 04-17-2006 at 08:21 AM.
 
Old 04-17-2006, 09:21 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Is the input still 1 line, or is the example in post #4 a new, multiple line, inputfile?

If the infile is multiple lines, you could do something like this (only first 2 lines are shown):

Code:
sed -n -e 's%<Year>\([0-9]*\)</Year>%\1%p' -n -e 's%<Month>\([0-9]*\)</Month>%\1%p' infile
Some chars are special and need to be escaped, but you can also change the separator that sed uses (changed it from / to % in the above example. Now you do not need to escape the / (in </zzzzz> constructs).

Also the -e option is new. This makes it possible to join multiple sed commands.

If it is one line, please post the line so I can have a look.

Hope this clears things up a bit.
 
Old 04-17-2006, 09:43 AM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655
Code:
\s*<Year>\([0-9]*\)<\/Year>
It might be better to use [0-9][0-9]* so that at least one digit is required for a match. You could also use [[:digit:]][[:digit:]]*

What is the "\s" for?
 
Old 04-17-2006, 09:47 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
@jschiwal:

Quote:
Originally Posted by jschiwal
It might be better to use [0-9][0-9]* so that at least one digit is required for a match. You could also use [[:digit:]][[:digit:]]*
Good point! Overlooked that myself
 
Old 04-17-2006, 10:35 AM   #8
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Original Poster
Rep: Reputation: 15
Thanks for your feedback guys =)

Quote:
Originally Posted by jschiwal
It might be better to use [0-9][0-9]* so that at least one digit is required for a match.
i tried this [0-9]+, but it didn't work, why not? i thought '+' means one or more is required...??

Quote:
Originally Posted by jschiwal
What is the "\s" for?
i read here:
http://www.webcom.com/glossary/regexp.shtml

that it means:
Quote:
\s Matches a whitespace char (space, tab, newline...)
since the file is multi-line i thought that would take care of matching the newline and eventual tabs and/or whitespaces after and before the xml tags...

Ok, so here is what i came up with:
Code:
echo "TIMESTAMP from SENT file $i: $(sed -n -e 's%.*<Year>\([0-9]*\)</Year>.*%\1%p' -n -e 's%.*<Month>\([0-9]*\)</Month>.*%\1%p' -n -e 's%.*<Day>\([0-9]*\)</Day>.*%\1%p' -n -e 's%.*<Hour>\([0-9]*\)</Hour>.*%\1%p' -n -e 's%.*<Minute>\([0-9]*\)</Minute>.*%\1%p' -n -e 's%.*<Second>\([0-9]*\)</Second>.*%\1%p' tmp)"
i added the .* before and after to take away whitespace, but the problem is that the output on the screen looks like this:
Code:
TIMESTAMP from SENT file test20060413134349.xml: 2006
04
13
13
43
i need it to look like this:
Quote:
TIMESTAMP from SENT file test20060413115712.xml: 200604131157
it seems like the 'p' option acts like a println, but i need it to act like a print (no newline). druuna thanks for the cool tip about switching the separator =)

PS. so it isn't possible to use multiple backward references like in my previous post?

Last edited by nickleus; 04-17-2006 at 11:14 AM.
 
Old 04-17-2006, 11:30 AM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

The newline after each print (from sed) causes this behavior. Besides writing a complete sed script I don't know how to solve this.

So I would not use sed to solve your problem. It could probably be done by sed, but awk (to name just one) can do it more elegant and simpler (my opinion):

Code:
#!/bin/bash

inFile="$1"

printf "IMESTAMP from SENT file ${inFile}: "

awk 'BEGIN { FS="[><]"}
  /Year/   { printf $3 }
  /Month/  { printf $3 }
  /Day/    { printf $3 }
  /Hour/   { printf $3 }
  /Minute/ { print $3 }
' ${inFile}
Mind the difference between print and printf (printf omits the newline, print does not).

Quote:
PS. so it isn't possible to use multiple backward references like in my previous post?
It is 'not possible' the way you set it up, including the fact that the inputfile is multiple lines. Not possible is between quotes, it is probably possible by writing a complete sed script, but I don't believe that is what you want (assumed by me......).

Hope this helps.
 
Old 04-18-2006, 08:44 AM   #10
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Original Poster
Rep: Reputation: 15
drunna, i'm trying to save the $3 value to a local variable and have tried many different things but can't get it to work:
Code:
awk 'BEGIN { FS="[><]"}
 /Year/   { TIMESTAMP=$3 }
 /Month/  { TIMESTAMP=$TIMESTAMP$3 }
 /Day/    { TIMESTAMP=$TIMESTAMP$3 }
 /Hour/   { TIMESTAMP=$TIMESTAMP$3 }
 /Minute/ { TIMESTAMP=$TIMESTAMP$3 }
' tmp
what am i doing wrong here? thanks so much in advance for your help =) have never used awk before..
 
Old 04-18-2006, 09:04 AM   #11
nickleus
Member
 
Registered: Nov 2004
Location: Noreg
Distribution: ubuntu
Posts: 107

Original Poster
Rep: Reputation: 15
WAIT! i figured it out! =)
Code:
TIMESTAMP=$(awk 'BEGIN { FS="[><]"}
/Year/   { printf $3  }
/Month/  { printf $3 }
/Day/    { printf $3 }
/Hour/   { printf $3 }
/Minute/ { print $3 }
' tmp)
and if i want to save it to a file instead i just formulate it this way:
Code:
awk 'BEGIN { FS="[><]"}
/Year/   { printf $3  }
/Month/  { printf $3 }
/Day/    { printf $3 }
/Hour/   { printf $3 }
/Minute/ { print $3 }
' tmp >> timestamp
sweetness!
 
Old 04-18-2006, 09:34 AM   #12
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

You figured it out, nothing to add from my side
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 07:17 AM
perl pattern match question lluciano Programming 4 02-28-2006 06:59 AM
svcadm: Pattern 'apache' doesn't match any instances xpucto Solaris / OpenSolaris 8 02-19-2006 08:30 AM
Procmail: match pattern then pass to shell script essdeeay Linux - Software 1 11-08-2004 03:19 PM
replacing pattern with sed produces double realos Programming 1 10-17-2002 09:03 PM


All times are GMT -5. The time now is 01:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration