ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
how i can put a secuential name between title and guid?
I use your line of yesterday:
H="try_"
awk -v name=$H 'BEGIN{RS = "[<>]"}/item/,/\/item/{if(/^(title|link|guid)$/ && getline)out = sprintf("%s%s%s",out,$0,/link/?"\n":"|");if(/\/item/){print (nombre)out x++;out=""}}' input.xml
The BEGIN rule sets the record separator to a beginning of a tag (plus any trailing whitespace prior to the <), and the field separator to > (plus any leading whitespace from the content). This means each tag, including its attributes, will be $1, and its immediate contents in $2.
The second rule clears the field array for a new item, and increases the id (sequence number). The second half of the rule pattern matches the case where the item tag has attributes; you can safely omit it if your input does not use attributes.
The third rule applies to all records where the first field starts with a letter -- i.e., to all start tags (except item, which matches the second rule, and the next tells awk to skip to the next record without applying further rules). It saves the immediate content of all start tags to the field associative array (keyed by the start tag name).
In your case the third rule contents could just be field[$1]=$2 but I wanted to point out that if you have attributes, they're included in $1. That is why I only use the name of the tag (removing all the attributes) as the key in the associative array.
The fourth rule applies to item end tag, and emits the desired pipe-separated record, using the associative array to recall the contents of each element.
Note that this scheme will only work for flat XML data without any comments or CDATA sections. Within each item element, it is assumed each element contains only text, not any sub-elements. If an element occurs twice, only the last one is recalled. So, it is very limited, but it might work for you.
to generate a good file name first. The %06d emits the number as 000001, 000002, and so on.
Then, output to the file, e.g.
Code:
print > file
Note that you can use multiple print or printf() commands to print to the file. The first one will clear the previous file contents, but all other print and printf() commands will append to it.
Finally, close the file using
Code:
close(file)
You need to close the file, because otherwise awk will run out of file descriptors. Unlike e.g. Bash, it does not automatically close files you've written to; you must explicitly close the file to tell awk you're done with it. (The same applies to reading from files using < redirection, too.)
You can run getconf OPEN_MAX to see the maximum number of open file descriptors per process. In Linux, the default is 1024, but at least three are used for standard input, output, and error. Thus, you usually can keep 1021 files open for reading or writing at the same time. (Awk might need one or more for itself, for e.g. reading the script file, though.)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.