using SED or AWK to cut data from a file, between certain characters
Hello,
Just found this website and have had a look around and seems that i have come to the right place. I would like to create a .ksh script which cuts certain data from a document. I have tried to use SED and AWK and piping but its been some time since i have operated on Linux and my memory is patchy. Any help would be greatly appreciated Hugh |
Use grep and its --only-matching option? You haven't provided examples of what you have and what you want to get back..
|
Quote:
|
it is something i am doing at work but wanted to get a head start before monday... I would like the script to open a file that user wants.. cut certain information from that document and write it to another file.
Unfortunaley i cannot provide an input and output just yet... potentially tomorrow. Code:
sed 's/^[ \t]*//' Statement.xml | awk '/objectName|fieldValue|fieldName/' | less i have looked onlne at how to write to another file but i havent been successful yet. Thankyou for gettin back to me. |
For the files, there are two ways that i can think of.
first, you can run the script with an input file and output file on the command line: sh <script>.ksh input.fil output.fil in your script, $1=input.fil, $2=output.fil or, get input with read, or, use read -p, like this: read -p " Enter the input filename: " INFILE to access INFILE, you need to use $INFILE. If this looks really foreign to you, there is a HOW-TO for bash scripting that might be really helpful. i hope that helps. |
there's no need to use sed. Just awk will do. And since you are dealing with xml (i assume), i think your requirement would not be that simple. Anyway, post whatever input you may have ,and show desired output as needed.
|
Hi again,
I have got the input file which is a .txt. Code:
Code:
ld><objectField> <fieldID>1118</fieldID><fieldName>jointLi Thanks |
HI guys, just wondering if the input/output file helped? I am still seeking help if anyone can.
Many Thanks, Hugh |
i still seek help wit this issue.
|
awk '/Sending XML/,/Message Sending ended/' file | xmllint --format > file.xml
Cheers, Tink |
Slight adjustment to tink's line as it will also print the included lines:
Code:
awk 'f;/Sending XML/,/Message Sending ended/{f=!f}' file | xmllint --format > file.xml |
Hello,
Firstly thankyou for getting back to me - i was beginning to lose hope. that code works but now i have a new problem. Code:
Sending XML : thanks again Hugh |
Hello again,
Code:
sed -n '/Sending XML/,/Message sending ended/p' ${infile} > ${outfile} Code:
#!/bin/bash Cheers |
Well the xmllint didn't work for me and your sed has the same issue as the original awk that it prints the lines in your range.
Apart from that I get the chunk returned that you provided. I am guessing xmllint doesn't like the chunk I have as it is a fragment and not complete. Also neither sed nor awk are producing the extra spacing you are showing in post #12. If the file was generated on Windows I would be guessing the last character of each line contains characters that unix does not like. |
Hello,
Are you not getting the line spaces in the extracted xml then? I get one on every line.. any ideas how i can stop this? |
This works for me with the posted file fragment.
The --recover option tells xmllint to "Output any parsable portions of an invalid document" (From the man page) It tries to fix the incomplete xml by adding the missing end tags. Code:
sed -n '/^ <document>/,/^$/s/^ //p' file | tr -d '\n' | xmllint --format --recover - |
Quote:
|
@ KENHELM...
did you code work then? Did you just run that line or did you use it alongside my script i posted? Im new to all this and not sure if its an addition to what i have already done? Code:
#!/bin/bash Thankyou |
I just ran the line of code I posted.
This is your script with the code inserted. Code:
#!/bin/bash s/^ // removes the single leading space on each line. tr -d '\n' removes the newlines, putting all the xml onto a single line. If the xml is valid you shouldn't need the '--recover' option to xmllint, but if you get some parsing error messages try putting it back in. |
All times are GMT -5. The time now is 09:13 AM. |