LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-16-2012, 04:49 AM   #1
atikan
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Rep: Reputation: Disabled
sed/awk to find different expressions in a file


Hi fellows,
I'm new to linux. I was trying to extract some data from a file of variable length. I was tring sed command, but with no success.

This is the sample file
Quote:
<HW>
<ProdName>xxx-abc</ProdName>
<ProdId ProdRev="R4B" ProdNo="ROY 123 456/2"/>
<Date Year="-" Month="-" Day="-"/>
<SerialNo>TF14560677</SerialNo>
<HWPos SubrackId="0" SlotNo="0"/>
</HW>
I want to extract the data like this
Quote:
ProdName=xxx-abc
ProdNo="ROY 123 456/2"
SerialNo=TF14560677
input file is a long file with similar data.

your help will be appreciated.
 
Old 10-16-2012, 05:09 AM   #2
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 133Reputation: 133
Code:
sed -n \
  -e 's/^<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p' \
  -e 's/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p' \
  -e 's/^<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1/p' FILENAME
Do a test-run. Maybe it needs some refinement.
 
Old 10-16-2012, 05:25 AM   #3
atikan
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thanks kabamaru for ur reply
I have tried the b/m code
Quote:
$ sed -n -e 's/^<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p' -e 's/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p' -e 's/^<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1/p' test1.txt > test4.txt
$ more test4.txt
ProdNo="ROY 208 423/3"
ProdNo="ROY 208 452/1"
its not matching the prodname and serial.
 
Old 10-16-2012, 05:32 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
Are you sure that the example you posted in post #1 is correct?

The solution posted by kabamaru works on my side.
 
Old 10-16-2012, 05:34 AM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,061

Rep: Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268
Your data looks like XML, here's an xmlstarlet solution:
Code:
xmlstarlet sel -T -t -m //HW \
    -o ProdName= -v ProdName --nl \
    -o 'ProdNo="' -v ProdId/@ProdNo -o '"' --nl \
    -o SerialNo= -v SerialNo --nl \
    file
 
Old 10-16-2012, 05:43 AM   #6
atikan
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
yes my data is xml. but my workplace doesnt let me download or install any 3party software(IT restrictions). will be grateful if u can refine the command u sent. btw i was able to get the serial # as well by removing the ^ sign from the code. now the code is printing like below but still the ProdName is not getting searched

Quote:
$ sed -n -e 's/^<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p' -e 's/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p' -e 's/<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1/p' test1.txt > test44.txt
$ more test44.txt
ProdNo="ROJ 208 423/3"
SerialNo=TD37956485
ProdNo="ROJ 208 452/1"
SerialNo=A063885359
i also tried the command without ^ sign for prodname but no success

Last edited by atikan; 10-16-2012 at 05:44 AM. Reason: prompt
 
Old 10-16-2012, 05:46 AM   #7
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 133Reputation: 133
Quote:
Originally Posted by atikan View Post
Thanks kabamaru for ur reply
I have tried the b/m code
its not matching the prodname and serial.
It works on the sample you gave. Could you post another sample, containing parts that you wanted to be extracted but weren't with the sed command? Maybe there are some variations, and we need to adjust the command.

Last edited by kabamaru; 10-16-2012 at 05:47 AM.
 
Old 10-16-2012, 06:08 AM   #8
atikan
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thanksalot for ur help kabamaru really appreiated. its working now.
below is the change i did according to the input file(only removed the ^ sign and added .*)
Quote:
$ sed -n -e 's/.*<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p' -e 's/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p' -e 's/<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1/p' test1.txt > test55.txt
$ more test55.txt
ProdName=SCB-RP
ProdNo="ROJ 208 423/3"
SerialNo=TD37956485
ProdName=PMD
ProdNo="ROJ 208 452/1"
SerialNo=A063885359

Last edited by atikan; 10-16-2012 at 06:09 AM. Reason: formating
 
Old 10-16-2012, 06:19 AM   #9
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 133Reputation: 133
Great. You can put this in a script (e.g. extract_records.sh):

Code:
#!/bin/sh

sed -n '
s/.*<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p
s/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p
s/<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1/p' $@
Then:

Code:
chmod +x extract_records.sh

./extract_records.sh file1 file2 file3... > output
This version adds a newline after the serial number:

Code:
#!/bin/sh

sed -n '
s/.*<ProdName>\(.*\)<\/ProdName>/ProdName=\1/p
s/.*ProdNo=\(".*"\)\/>/ProdNo=\1/p
s/<SerialNo>\(.*\)<\/SerialNo>/SerialNo=\1@/p' $@ | tr @ '\n'

Last edited by kabamaru; 10-16-2012 at 06:21 AM.
 
Old 10-16-2012, 06:24 AM   #10
atikan
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Wow thanksalot this will help me a lot in my learning curve. i will try them tomorrow and update you. thanks again for the help.
 
Old 10-16-2012, 06:33 AM   #11
shayno90
Member
 
Registered: Oct 2009
Distribution: Windows10 Linux Mint NST Kali CentOS
Posts: 199
Blog Entries: 3

Rep: Reputation: 24
For what it is worth:

sed -e 's_HW__' -e 's_^<*__' -e 's_/>__' -e 's_>__' -e 's_>$__' -e 's_</ProdName__' -e 's_</SerialNo__' -e 's_ProdName_ProdName=_' -e 's_SerialNo_SerialNo=_' filename > filename1

Just wondering what is a better way to search and replace through a line instead of at the beginning or end of a line?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Find URL in Debian package index via awk/sed (=find a line, then search from there) maddes.b Linux - Software 11 06-28-2013 08:37 AM
[SOLVED] sed 's/Tb05.5K5.100/Tb229/' alone but doesn't work in sed file w/ other expressions Radha.jg Programming 6 03-03-2011 08:59 AM
[SOLVED] sed, awk, Keep only text between two regular expressions scott_audio Linux - Newbie 9 08-06-2009 03:46 PM
Sed/awk help with regular expressions needed AP81 Programming 3 07-28-2008 08:26 AM
find awk sed.. something along these lines citrus Linux - General 1 08-21-2006 04:04 PM


All times are GMT -5. The time now is 04:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration