LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-10-2009, 03:55 PM   #1
freeindy
Member
 
Registered: Nov 2002
Posts: 207

Rep: Reputation: 32
sed extract parameters between sections


Hi,

I have an xml (iTunes) file in the following format:

Code:
<key>Track ID</key><integer>729</integer>
<key>Name</key><string>My Narrow Mind</string>
<key>Artist</key><string>16 Horsepower</string>
<key>Album</key><string>Low estate</string>
<key>Genre</key><string>Rock</string>
<key>Kind</key><string>AAC audio file</string>
...
...
...
<key>Library Folder Count</key><integer>1</integer>
For each song, they have the same structure like above. I want to filter the xml file for each song to the following format:

"SongName","Artist","Album"
"SongName","Artist","Album"
"SongName","Artist","Album"
...


Please note that it's the actual song name, artist and album is what I want extracted and not the text Name, Artist, Album.

I'm using sed and have used before. But I can't get mine to work at all.

Any help is appreciated.
Thanks a million.

Indy
 
Old 07-10-2009, 04:28 PM   #2
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 78
Although it may be possible with sed, I would recommend something at least a little more powerful such as awk.
 
Old 07-10-2009, 05:06 PM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
You could use xsltproc to extract the information you want. Look at an example of a very similar program in Linc Fessindon's bashpodder:
http://lincgeek.org/bashpodder/

Using sed, you could use a sed like:
/<key>Name/s/<key>\(Name\)<\/key><string>\([^<]*\)<\/string>/
to select the line, extract the info and add the quotes.

The snag is that you are joining information found on separate lines. sed is a line-by-line editor. That means that you need to push the lines into the Hold buffer and pop them back when you have all three lines. Lastly remove the '\n' characters and print it out.

Start out by selecting a range of lines.
Code:
/<key>Name/,/<key>Album/{
Then inside the range, select each line you need changed:
Code:
#n
/<key>Name/,/<key>Album/{
                          /<key>Name/s/<key>Name<\/key><string>\([^<]*\)<\/string>/"\1",/p
                          /<key>Artist/s/<key>Artist<\/key><string>\([^<]*\)<\/string>/"\1",/p
                          /<key>Album/s/<key>Album<\/key><string>\([^<]*\)<\/string>/"\1"/p
                        }
I added the print command to test if it is working as expected so far.


Add the instructions to push the substitutions into the hold buffer. Pop the results after the last line in the range. Lastly remove the '\n' characters to join the lines, print the results. I cleared out the line buffer before exchanging, so that when you start with the next song, the hold buffer starts out empty.

Code:
#n
/<key>Name/,/<key>Album/{
                           /<key>Name/{
                                s/<key>Name<\/key><string>\([^<]*\)<\/string>/"\1",/
                                H
                                     }
                          /<key>Artist/{
                                s/<key>Artist<\/key><string>\([^<]*\)<\/string>/"\1",/
                                H
                                     }
                          /<key>Album/{
                                s/<key>Album<\/key><string>\([^<]*\)<\/string>/"\1"/
                                H
                                s/.*//
                                x
                                s/\n//g
                                p
                                     }
                        }
I hadn't intended to provide the full solution, but to test and debug my test, I needed a working example. Sorry if I ruined your fun.
I did want to illustrate ranges, and subranges. Also notice how the commands are grouped. Often you need to group the commands after a match to prevent the next line being read in after certain commands. Also notice how indentation can make a sed program less "write once, read never".


I would still recommend using xsltproc instead. A sed solution is very cryptic. This is what xsltproc is designed to do.

Last edited by jschiwal; 07-10-2009 at 05:28 PM.
 
Old 07-10-2009, 06:14 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk '
/Name/{gsub(/.*<string>|<\/string>/,""); printf $0}
/Artist/{ gsub(/.*<string>|<\/string>/,"");printf ","$0}
/Album/{gsub(/.*<string>|<\/string>/,"");print ","$0}
' file
 
Old 07-14-2009, 03:02 PM   #5
freeindy
Member
 
Registered: Nov 2002
Posts: 207

Original Poster
Rep: Reputation: 32
Hi,

Sorry for the late answer, I've been moving around a bit lately.

And also thanks for the answers.

Osor and ghostdog74,
Thanks for the awk. The reason I didn't I ask for awk is because I have no experience in awk. I've been using sed but it was some time ago. Ghostdog74, your solution worked. thanks. But there is something wrong with the output and I found out that not all the songs contains <key>Album</key> tags. So sometimes the result becomes:

song1,artist1,album1
song2,artist2,song3,artist3,album3

So because there is no line for the search no and therefore CR is produced. Now, again, I'm not that great awk. Can you have if/else statement in there somewhere for echoing a CR if tag doesn't exst? I don't mind learning awk but learning by example helps...

jschiwal,
Thanks. I don't need the solution. Just the guidance. This is what I have managed to refreshing my memory and from your comments:

Code:
sed '/<key>Name<\/key>/,/<key>Album<\/key>/{                                    
         /<key>Name<\/key>/{                                                    
              /<string>/,/<\/string>/{                                          
              }                                                                 
         }
} iTunes\ Music\ Library.xml
So all i'm lacking right not is to get the string out between the <string> tags and put it in a place holder and then 'spit it out'. H flag doesn't work. It complains about bad flag. must be version issue i guess..

Indy
 
Old 07-19-2009, 12:51 PM   #6
freeindy
Member
 
Registered: Nov 2002
Posts: 207

Original Poster
Rep: Reputation: 32
After experimenting a lot, I found a solution. If anyone is interested...

I didn't realise that awk was so much simpler to use than sed for me. I develop most of the software in c and it was very similar.

Code:
/<key>Name<\/key/{
   gsub(/.*<string>|<\/string>/,"")
   if (/Library/){
       #Playlists found. Not interesting because no more songs info exists                                 
       exit
   }
   else{
       printf "\""$0"\","
   }

   artist = "false"
   album = "false"
   while ($0 !~ /<key>Kind<\/key>/){
      getline
      if (/<key>Artist<\/key>/){
         gsub(/.*<string>|<\/string>/,"");printf "\""$0"\","
         artist = "true"
      }

      if (/<key>Album<\/key>/){
         if(artist == "true"){
             gsub(/.*<string>|<\/string>/,"");printf "\""$0"\"\n"
         }
         else
         {
             printf "\""artist_not_found"\","
             gsub(/.*<string>|<\/string>/,"");printf "\""$0"\"\n"
         }
         album = "true"
         artist = "true"
      }
   }

   if(album == "false"){
      printf "\"album_not_found\"\n"
   }
}
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove sections of a xml file with sed viniciusandre Linux - Software 2 04-20-2009 01:18 PM
sed/awk most frequently used parameters ilnli Linux - General 1 07-02-2005 04:50 AM
positional parameters $1 $2 in sed command Warmduvet Programming 12 09-22-2004 03:56 PM
Parameters in sed command linuxdev Linux - Newbie 13 02-09-2004 06:14 PM
How to extract a part of a line by sed? J_Szucs Programming 2 02-15-2003 06:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration