Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hello everyone, excuse my question if has already been answered but i require a quick fix...
I have a large XML file which needs some of the information extracted.
<transactionType> 7 </transactionType> # the value 7
i have re arranged the file so that everything is alligned immediately left making it easier to sort. The object types/Complex elements i require are as follows:
objectName,fieldValue,fieldName
I have this so far (below) which sorts the XML file based on objectName, fieldName and fieldValue. I would like to take the values between > and </ i understand i can use a delimeter but i have been using linux for one week an i am struggling!
sed 's/^[ \t]*//' Statement.xml | awk '/objectName|fieldValue|fieldName/' | less
ANy help would me greatly appreciated!
I intend to make this into a KSH script with if statements and variables if i can get it to work how its intended.
Parsing XML is awkward at the best of times, but it can be done using Sed and/or AWK of you're determined. Alternately, look up XMLgawk - it's an awk extension which makes parsing XML easier.
Anyhow.. Before anyone can easily help, it would be great if you would paste for us a real snippet of your input file, so we can see what's exactly got to be done. Also show us an example of what you would like to come out of your script, based on the input file you show us.
For me, I found your explanation of what you need a little bit confusing, so some more info would be helpful and please, when posting code or input-file snippets, use code tags: http://www.phpbb.com/community/faq.php?mode=bbcode#f2r1
It sort of addresses my question. Let me rephrase a bit:
Show us exactly what you want the results to look like. For example, do you want the output, based on that input, to look like:
Code:
objectType -> claim
fieldName -> I don't see this tag up there anywhere!
fieldValue -> LVG
or do you want it like:
Code:
objectType = claim, fieldName = I don't see this tag up there anywhere!, fieldValue = LVG
Or what?
And, what separates the records in the file? Like, is the file a repeated sequence of similar blocks of text, separated by an empty line? Or is it a continuous, random pile of XML tags with no particular repeating sequence or order? Maybe showing us a longer section of the input file will answer this question. And, please use [code] tags.
I am with GrapefruiTgirl on this one ... the information you are providing doesn't seem to meet what you are saying. Remember we are currently not worried about your skills or ability to
perform a regular expression, but more what you are starting with and what you want to finish with.
I will use your examples from post #5 to illustrate what I mean.
You have said that you want your output to look like:
Now I might have missed it, but from what I can see NONE of the information in your required output is in your input file??
eg. Lv20073 is not in the original file anywhere ... so where does this information come from? If that is the manual input you are doing, how did the original input file influence any of the
data shown in the output file?
You then go on to verbally say:
Quote:
Object type to Complex Element.
Field Name to Simple Element
field Value to value of simple element.
So my issues here are:
1. There is only an objectfield ... no object type
2. Fieldname currently contains something like 'guaranteePeriod'. What does this have to do with Simple Element? Or is this now a reference to an xml term?
3. Fieldvalue currently contains something like '5'. Is this not already simple??
this is the original xml document. I might have changed some values before as im not sure i can post the whole thing online. I might get in trouble.
From this, i would like AWK to extract the fields i need and values into a new xml document. The output above is correct and what i want to achieve.
Apologies if i am not being clear, this is all very new to me.
The 'XML' you posted is not valid XML. Is this some kind of idea that doesn't really exist yet? Did you try to manually transcribe the data, or did you do the simple thing, and copy/paste the XML-ish data? Little or none of your example data corresponds to the terms used in your verbal description.
Code:
− <<<=== Untagged data
<objectField>
<fieldID>1118</fieldID>
<fieldName>jointLifePercentage</fieldName>
<fieldValue/> <<<=== Malformed Tag
</objectField>
−
<objectField>
Where's the DTD?
Given useful information, these kinds of problems typically have good solutions. You seem to be unable to give us enough helpful information to get there.
OK ... So I see where you are coming from now. You have picked a fairly hefty problem for your first time at bat
I would have to agree with earlier information to say that as it stands, awk would not really be the correct tool here. I mean it could do it, but the complexity of the code
would probably outweigh the need.
You want to look at Perl and specifically this will get you started. Also there are a number of Perl gurus around
here that can help you when you get stuck.
i have just noticed that i have pasted the wrong part of the xml. i think what i am trying to say refers to the start of the xml document mainly, where <transaction> is found. Ill upload it in the morning. I am currently looking into xml gawk which looks like it could help. I have no idea about perl, i am a graduate developer trying to learn new tricks! thanks for the help!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.