LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 02-09-2012, 06:13 AM   #1
frambau
Member
 
Registered: Feb 2012
Posts: 32

Rep: Reputation: Disabled
iter around same tags in xml with awk


Hi,

I have and XML like this

<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
<name>juan</name>
<surname>martinez</surname>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
<name>paco</name>
<surname>perez</surname>
</item>
...
...
...

and i need write all tags title in a different file index like this

Jack win the news > try_1.out
Glad to roll.. > try_2.out
...
...
...

i have this (thanks grail)
awk 'BEGIN{RS = "[<>]"}{ORS = /title/?"\n":"|"}/item/,/\/item/{if(/^(title)$/ && getline) print} input.xml

and this output
Jack win the news
Glad to roll..

any help? the other question is put in the title tag, the file name that i wrote

TAHNKS FOR ALL
 
Old 02-09-2012, 07:10 AM   #2
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk '/<title>/ { gsub(/<\/?title>/,""); print > "try_"(++i)".out"; }' input.xml
 
Old 02-09-2012, 07:15 AM   #3
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Thank you so much
 
Old 02-09-2012, 07:46 AM   #4
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
please CEDRIK another question.

if I get an input parameter in my sh how I can put it as a constant to make it like this?

Jack win the news > parameter_1.out
Glad to roll.. > parameter_2.out
 
Old 02-09-2012, 07:59 AM   #5
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
thanks i try this

H="paco_"
awk -v nombre=$H '/<title>/ { gsub(/<\/?title>/,""); print > (nombre)(++i)".out"; }' input.xml

and this is correct THANKS YOU VERY MUCH!!!!!
 
Old 02-09-2012, 08:56 AM   #6
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 241Reputation: 241Reputation: 241
Nice to see you make progress
 
Old 02-10-2012, 01:28 AM   #7
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Cedrik,

Another quesion for my XML,

If I remove the tags in my xml, how I can copy the contents of tags in a csv file, and that one is a sequential name tag?

for example,
<item>
<title>Jack win the news</title>
<link>http://nom/hot/jack+white</link>
<guid>http://nt/jack+white</guid>
<name>juan</name>
<surname>martinez</surname>
</item>
<item>
<title>Glad to roll..</title>
<link>http://ww=1050681776</link>
<guid>http://fa&amp;id=1050681776</guid>
<name>paco</name>
<surname>perez</surname>
</item>
...
...
...


my csv will be

Jack win the news|try_1|http://nt/jack+white|juan|martinez
Glad to roll..|try_2|http://fa&amp;id=1050681776|paco|perez
...
...
...

I have this for extract the fields

awk 'BEGIN{RS = "[<>]"}{ORS = /surname/?"\n":"|"}/item/,/\/item/{if(/^(title|link|guid|name|surname)$/ && getline)print}' input.xml

but idon't know how i can replace the link tag...

Thanks for all
 
Old 02-10-2012, 02:12 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Mint
Posts: 5,403

Rep: Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110
Just take link out of current if and use an alternate one for it instead.
 
Old 02-10-2012, 02:50 AM   #9
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
Yes Grail, but if I use this

awk 'BEGIN{RS = "[<>]"}{ORS = /surname/?"\n":"|"}/item/,/\/item/{if(/^(title|guid|name|surname)$/ && getline)print}' input.xml

I am printing (title|guid|name|surname)

how i can put a secuential name between title and guid?

I use your line of yesterday:
H="try_"
awk -v name=$H 'BEGIN{RS = "[<>]"}/item/,/\/item/{if(/^(title|link|guid)$/ && getline)out = sprintf("%s%s%s",out,$0,/link/?"\n":"|");if(/\/item/){print (nombre)out x++;out=""}}' input.xml


Jack win the news|http://nom/hot/jack+white|http://nt/jack+white|try_0
Glad to roll..|http://ww=1050681776|http://fa&amp;id=1050681776|try_1

I need this,

Jack win the news|try_0|http://nt/jack+white
Glad to roll..|try_1|http://fa&amp;id=1050681776

Thanks again
 
Old 02-10-2012, 03:24 AM   #10
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,554
Blog Entries: 3

Rep: Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816
How about
Code:
awk -v prefix="try_" '
    BEGIN {
        RS = "[\t\n\v\f\r ]*[<]"
        FS = "[>][\t\n\v\f\r ]*"
        id = 0
    }

    ($1 == "item" || $1 ~ /^item[\t\n\v\f\r ]/) {
        split("", field)
        id++
        next
    }

    ($1 ~ /^[A-Za-z]/) {
        name = $1
        sub(/[\t\n\v\f\r ]*$/, "", name)
        field[name] = $2
        next
    }

    ($1 == "/item") {
        printf("%s%d|%s|%s|%s|%s|\n",
               field["title"],
               prefix, id,
               field["guid"],
               field["name"],
               field["surname"])
    }
        
' input-file > output-file
The BEGIN rule sets the record separator to a beginning of a tag (plus any trailing whitespace prior to the <), and the field separator to > (plus any leading whitespace from the content). This means each tag, including its attributes, will be $1, and its immediate contents in $2.

The second rule clears the field array for a new item, and increases the id (sequence number). The second half of the rule pattern matches the case where the item tag has attributes; you can safely omit it if your input does not use attributes.

The third rule applies to all records where the first field starts with a letter -- i.e., to all start tags (except item, which matches the second rule, and the next tells awk to skip to the next record without applying further rules). It saves the immediate content of all start tags to the field associative array (keyed by the start tag name).

In your case the third rule contents could just be field[$1]=$2 but I wanted to point out that if you have attributes, they're included in $1. That is why I only use the name of the tag (removing all the attributes) as the key in the associative array.

The fourth rule applies to item end tag, and emits the desired pipe-separated record, using the associative array to recall the contents of each element.

Note that this scheme will only work for flat XML data without any comments or CDATA sections. Within each item element, it is assumed each element contains only text, not any sub-elements. If an element occurs twice, only the last one is recalled. So, it is very limited, but it might work for you.

Hope this helps,
 
Old 02-10-2012, 03:36 AM   #11
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
THANKS NOMINAL.

I'm trying various things, about it and trying this:

awk '/<title>/ { gsub(/<\/?title>/,""); print > "try_"(++i)".out"; }' input.xml

the sentence can only create output files thousand and I need some more.

How i can create more?

---------- Post added 02-10-12 at 04:36 AM ----------

thanks for all to all!!!!
 
Old 02-10-2012, 04:18 AM   #12
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
I'm trying tu use close after "try_"(++i)".out"

like this

awk '/<title>/ { gsub(/<\/?title>/,""); print > "try_"(++i)".out" close((name)(i)".out") }' input.xml

but it generate files with this name

try__2.out-1
try__1.out-1

Any help?

THANKS
 
Old 02-10-2012, 04:56 AM   #13
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,554
Blog Entries: 3

Rep: Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816Reputation: 816
Use
Code:
   file = sprintf("%s%06d.out", "try_", ++id)
to generate a good file name first. The %06d emits the number as 000001, 000002, and so on.

Then, output to the file, e.g.
Code:
   print > file
Note that you can use multiple print or printf() commands to print to the file. The first one will clear the previous file contents, but all other print and printf() commands will append to it.

Finally, close the file using
Code:
   close(file)
You need to close the file, because otherwise awk will run out of file descriptors. Unlike e.g. Bash, it does not automatically close files you've written to; you must explicitly close the file to tell awk you're done with it. (The same applies to reading from files using < redirection, too.)

You can run getconf OPEN_MAX to see the maximum number of open file descriptors per process. In Linux, the default is 1024, but at least three are used for standard input, output, and error. Thus, you usually can keep 1021 files open for reading or writing at the same time. (Awk might need one or more for itself, for e.g. reading the script file, though.)
 
Old 02-10-2012, 05:21 AM   #14
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
thnaks Nominal i'm going to try this

awk -v name=$H 'BEGIN {RS = "[<>]"}{ORS = /description/?"\n":"|"}/item/,/\/item/{if(/^(description)$/ && getline)out = $0; file = sprintf("%s%d.out",name,++i); print out > file;close(file)}' input.xml

because the particularities of my program.

but for two items, this sentence print a lot of .out files why?????

THANKS AGAIN FOR ALL
 
Old 02-10-2012, 05:24 AM   #15
frambau
Member
 
Registered: Feb 2012
Posts: 32

Original Poster
Rep: Reputation: Disabled
I realy need to extract all the tag content in a xml to a diferent secuentials files. One for each tags
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Grep,Awk,Sed]Parsing text between XML tags. ////// Programming 5 07-26-2011 11:54 AM
Extract Data between XML tags aharrison Linux - Newbie 13 11-17-2010 07:28 PM
XML Schema - redifinition of tags Omni Programming 2 09-20-2006 10:48 AM
Remove XML style tags using C kuronai Programming 8 11-12-2004 12:27 AM
Parsing XML tags with php, can't get attributes of a tag jimieee Programming 1 05-05-2004 10:32 AM


All times are GMT -5. The time now is 04:41 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration