tnrooster 01-28-2011 05:26 PM

sed between command syntax
I'm having trouble getting a sed command to work.

I need to filter a printed line to get characters between 2 expressions. "<job>" and "<url>".


echo "<job>TEST<url>" | sed 's/^[<job>]*//' | sed 's/[<url>]*$//'

It takes off the first part but why is that not deleting the "<url>" and all the parts after that?

Thanks, Joe.

jlinkels 01-28-2011 05:36 PM

Most likely because the second expression does not match.

sed 's/[<url>]*$//'
tries to match for zero or more ']', but no other characters.

Try this:

sed 's/[<url>].*$//'
and it'll work. (note the '.')


tnrooster 01-28-2011 05:50 PM


Thank you so much!

tnrooster 01-28-2011 06:22 PM

Well that does work but I seem to be missing something here.

If I extend the data line and include say another field "<img>" and use that as the substitution lookup , it still only returns the same as if I were using "<url>".

Also if I shift the sections over in the data line, ie.. use <url> in the start and <img> in the other , I don't get the parts in between.


echo "<job>TEST<url><img>001.png" | sed 's/[<job>]*//;s/[<img>].*$//'
I still get


jlinkels 01-28-2011 06:49 PM


echo "<job>TEST<url><img>001.png" | sed 's/<job>//' | sed 's/<img>.*//'

ghostdog74 01-28-2011 07:19 PM

use awk, forget about sed.


$ echo "<job>TEST<url><img>001.png<job>TEST1<url>asdfs" | awk -vRS="<url>" -vFS="<job>" '{print $2}'

tnrooster 01-31-2011 03:55 PM

Thanks everyone, jlinkels. All the parsing is fine for the feed file but I got one issue i can not figure out.

The naming system works for output files on my fedora system but when i moved it to the centos server I'm getting these oddly named files.

For the same feeder file, which is used to name the output files, I get the same odd names each time.

IE.. for files that should be sequential, I'm getting names of

and so on.

Always the same ones for instead of the proper filename.

If I remove the file extension I still get the same values.

If I echo the variable used to make the names I get the proper results.

How are these odd names being generated?

Ok its seems the system on which these are being made reads the samba directory correctly. The names all appear to be exactly as the should. When viewing the same files on any other system across the same samba connection they all have the same temp file like names.

tnrooster 01-31-2011 04:32 PM

Its now looking as if there is a special character on the end of the line that maybe handled by fedora and not centos.

Windows line feed most likely, anyone have a way to clear those out via sed?

jlinkels 01-31-2011 05:03 PM

Do you mean at the end of each line in the file? Use dos2linux to convert the Window CRLF to Linux LF


crts 01-31-2011 09:08 PM


I assume that you mean the filenames that are stored inside your document that get messed up by the CR.
Try this
sed -r 's/<job>(.*)<url>.*/\1/'

sed -r 's/<job>(.*)<img>.*/\1/'

