LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-07-2012, 05:53 AM   #1
frambau
Member
 
Registered: Feb 2012
Posts: 32

Rep: Reputation: Disabled
Awk xml convert to csv


Hi, I nedd to convert a xml like this,

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:feed="httpeed"
xmlns:comment="http#comment">
<channel>
<title>Spi</title>
<link>http://spcom</link>
<description>Sp-eval-3</description>
<dc:source>http://35</dc:source>
<api:next_request_url>h8493</api:next_request_url>
<atom:link rel="next" href="1307738768140558493">
</atom:link>
<generator>Po3r</generator>
<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
</item>

</channel>
</rss>


in a csv file, but only the tags included into <item> and </item> tag.

something like this,


Jack win the news|[url]http://nom/hot/jack+white|http://nt/jack+white
Glad to roll..|[url]http://ww;id=1050681776|http://fa&amp;id=1050681776

Any help????
 
Old 02-07-2012, 07:13 AM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
My advice would be to use a language which has a module for XML parsing, such as Python. I haven't found any such modules for awk (I'm not a fan, so it's possible I'm not looking in the right places) but python, java and perl all have an XML parsing module. If you use one of these libraries to do the parsing, then constructing your csv file should be relatively easy

Hope this helps,
 
Old 02-07-2012, 07:15 AM   #3
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Thanks, Snark1994, but i need construct my solution with awk.
 
Old 02-07-2012, 07:17 AM   #4
cbtshare
Member
 
Registered: Jul 2009
Posts: 610

Rep: Reputation: 42
In addition to what was said above, you will find others more willing to help if you also give an example of what you have tried using awk and where your falling short.Not just expect us to provide you a solution.
 
Old 02-07-2012, 08:16 AM   #5
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
OK,

I'm trying this,

awk -F'[<>]' '{ORS = "|"};\
/<title/{split($3,a); print a[1] > "output.csv" };\
/<link/{split($3,b); print b[1] "\n">> "output.csv"} ' inputfile.xml

for the first two fields...

this is the output.

Spi|http://spcom
|Jack win the news|http://nom/hot/jack+white
|Glad to roll..|http://ww=1050681776
|

My fisrt problem is eliminate from the output file the fisrt tags title and link, because i need only the tags included into items tags.

My second problem is include the pipe only after the fields.

thanks for all

Last edited by frambau; 02-07-2012 at 08:20 AM.
 
Old 02-08-2012, 02:21 AM   #6
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Now with this,

sed -n '/<item>/,/<\/item>/p'

output line,

<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
</item>

I solved my first problem.

I have a new question. if there are some lines into a tag, how can i read this? for example

<title>Glad
to
roll..</title>

I need this output

Glad
to
roll..

Thanks for all
 
Old 02-08-2012, 03:15 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
I thought you said:
Quote:
Thanks, Snark1994, but i need construct my solution with awk.
Yet you have now changed to sed?? Or is the real issue that you do not wish to use Perl or Python?
 
Old 02-08-2012, 03:24 AM   #8
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
I need to use shell.

Sed whith awk is good. I don't know anythink of Perl or Python. This is the reason because I used awk and sed.

Sorry if i caused any inconvenient.

Now my problem only is this

if I have this tag

<title>Glad
to
roll..</title>

How I can get this?

Glad
to
roll..

thanks again
 
Old 02-08-2012, 07:37 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
So maybe something like:
Code:
awk 'BEGIN{RS = "[<>]"}{ORS = /guid/?"\n":"|"}/item/,/\/item/{if(/^(title|link|guid)$/ && getline)print}' file
 
Old 02-08-2012, 08:43 AM   #10
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Thanks you very much Grail. I'm impressed.

I'm working too, about another question.

if I've a XML like this:

<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
<name>25</name>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
<name>35</name>
</item>
...
...
...

how I can create files named as the tag name, and the contents of the title tag?

for now I've done this:

sed -n '/<name>/p' input.xml > fileaux.txt
awk -F'[<>]' '/<name/{split($3,a); print "touch ./named/" a[1] ".xml" > "files.sh"}' fileaux.txt

#generate files
mkdir named
chmod 777 files.sh
./files.sh


And I have a directory with to empty files named 25.xml and 35.xml. The question is how to write the title tag within the corresponding file???

Thanks for all
 
Old 02-08-2012, 11:49 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Use the same format I have already shown you but save the data in a variable and output it into the "name tag" file once you hit it.

Let me know how you go?
 
Old 02-08-2012, 03:30 PM   #12
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
ok tomorrow try it and tell you how I go.

THANKS FOR ALL
 
Old 02-09-2012, 04:33 AM   #13
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
hi,

I don't know how can i assing a tag into a variable.

Is it possible to iterate for each field title and name? How?
 
Old 02-09-2012, 06:02 AM   #14
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Grail another question.

I have this

awk 'BEGIN{RS = "[<>]"}{ORS = /guid/?"\n":"|"}/item/,/\/item/{if(/^(title|name|surname|link|guid)$/ && getline)print}' file

I need to write the value of a variable in the csv, only instead of containing the link field value

if the variable value is i=1, 2, 3,.......

<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
<name>juan</name>
<surname>martinez</surname>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
<name>paco</name>
<surname>perez</surname>
</item>
...
...
...

the csv out can be
Jack win the news|juan|martinez|1|http://nt/jack+white
Glad to roll..|paco|perez|2|http://fa&amp;id=1050681776


thanks for all
 
Old 02-09-2012, 08:11 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
See if this helps you, I have used the old data but you should be able to mold to what you are looking for:
Code:
awk 'BEGIN{RS = "[<>]"}/item/,/\/item/{if(/^(title|link|guid)$/ && getline)out = sprintf("%s%s%s",out,$0,/guid/?"\n":"|");if(/\/item/){print out x++;out=""}}' file
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk or sed to use CSV as input and XML as template and output to a single file bridrod Linux - Newbie 6 03-13-2012 08:00 PM
how to convert a simplte text file into csv using awk script certteam Linux - General 1 09-15-2010 01:23 AM
looking for software to convert multiple csv files to a single xml file Rocket-boy Linux - Software 6 10-28-2009 11:03 AM
Using awk/sed to convert linefeed to csv, with some formatting jaykup Programming 1 04-03-2009 06:18 PM
no xml, convert tvtime stationlist to xml for mythtv/freevo... frenchn00b Linux - General 8 11-04-2007 12:35 AM


All times are GMT -5. The time now is 06:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration