Parse/rewrite file help
rhel 5.7 bash
need to convert a file like this. basically just need to identfy the start of a line with "ID", read that line into a variable, then output. continue to loop until eof. $X,$X $X,data from line 2 $X,data from line 3 $X,date from line 4 then reset variable when "ID" is found again, the repeat, etc. xxxxx, yyyyy, zzzzz, here is just random data, but it is a whole line. (input file) ID = xyz name = abc xxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyy zzzzzzzzzzzzzzzzzzzzzzzz ID = THE name = band xxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyy zzzzzzzzzzzzzzzzzzzzzzzz (desired output file) ID = xyz name = abc,ID = xyz name = abc ID = xyz name = abc,xxxxxxxxxxxxxxxxxxxxxxxx ID = xyz name = abc,yyyyyyyyyyyyyyyyyyyyyyyy ID = xyz name = abc,zzzzzzzzzzzzzzzzzzzzzzzz ID = THE name = band,ID = THE name = band ID = THE name = band,xxxxxxxxxxxxxxxxxxxxxxxx ID = THE name = band,yyyyyyyyyyyyyyyyyyyyyyyy ID = THE name = band,zzzzzzzzzzzzzzzzzzzzzzzz |
It's easier to use awk than just bash
cat in ID = xyz name = abc 1xxxxxxxxxxxxxxxxxxxxxxx 1yyyyyyyyyyyyyyyyyyyyyyy 1zzzzzzzzzzzzzzzzzzzzzzz ID = THE name = band 2xxxxxxxxxxxxxxxxxxxxxxx 2yyyyyyyyyyyyyyyyyyyyyyy 2zzzzzzzzzzzzzzzzzzzzzzz awk '/^ID/ {id=$0} {printf "%s,%s\n",id,$0}' <in ID = xyz name = abc,ID = xyz name = abc ID = xyz name = abc,1xxxxxxxxxxxxxxxxxxxxxxx ID = xyz name = abc,1yyyyyyyyyyyyyyyyyyyyyyy ID = xyz name = abc,1zzzzzzzzzzzzzzzzzzzzzzz ID = THE name = band,ID = THE name = band ID = THE name = band,2xxxxxxxxxxxxxxxxxxxxxxx ID = THE name = band,2yyyyyyyyyyyyyyyyyyyyyyy ID = THE name = band,2zzzzzzzzzzzzzzzzzzzzzzz |
Could make it even simpler:
Code:
awk '/^ID/ {id=$0}$0 = id","$0' file |
thnx, i will give this a try.
|
ok, small issue.
i forgot to say that the input file has lines before "^ID" and those need to be skipped (# of lines is random). input file also has blank lines in random places, so i need to also skip any blank lines. as example: (input file) Code:
junk filler random data yada yada yada using (awk '/^ID/ {id=$0}$0 = id","$0' file) this is what i get Code:
,junk filler random data yada yada yada |
Add condition "print only if id is set" to grail's awk command:
Code:
awk '/^ID/ {id=$0} id && $0 = id","$0' file |
Quote:
Code:
[root@host ~]$ more test3.txt i was trying someing like this, but cant get the ELSE part to work as desired (that is, know if "ID" was already found by using variable) Code:
#!/bin/awk -f |
i did some playing around with awk script, came up with this. it ignores all lines up until it finds $1=ID, and also ignores empty lines. seems to work ok. i will need to manually chop out of the output file a few lines at the end since my input file has no definitive marker, but no big deal. any way to simplify?
Code:
#!/bin/awk -f |
Is there more junk mixed in through the file after ID is discovered for the first time?
Are there always blank lines between IDs? It helps if you can describe your input data more if we are to mold the solution. |
Quote:
thnx. |
i ended up with this.
Code:
#!/bin/awk -f |
This seems overly complicated. Let me see if I understand the file structure:
1. Any amount of crap but definitely not the letters ID prior to the first invocation of ID 2. Once ID is found there will be lines of data to be prepended with the ID and a comma 3. There may also occur blank (you may need to confirm if blank means nothing but a newline or possible could be white space as well) lines after ID is found So based on the above the idea is ALL lines must be printed irrelevant of data but any post ID string being found must have ID string and a comma inserted (correct?) Code:
awk '/ID/{id = $0}id && NF{$0=id","$0}1' file |
Quote:
here is raw source (sanitized fubar). i dont need anyting until 1st occurance of "ID", no blank lines, and i dont need "TSP0320I LIST FUNCTION SUCCESSFUL" near the end or anything after that, etc. notice i also skip the following (or similar crud)that is wedged between pages and/or ID's. 1COMPUTER ASSOCIATES ***** T S S C O M M A N D P R O C E S S O R ***** TSSSSSDB PAGE 2 CA-POT RET/VS 2.0 12/04/2010 11.36.04 my real source file is ~30k lines and has many many ID's with each ID having random # of lines associated with ID, etc. i dunno if the TS report can be created in different ways, but this is the raw source i have to work with. can you get this into a one line awk? if so hats off to you. my code in post #11 does the job, but i like simpler if you can achieve that. thnx. output is OFS="|" and should look like this (the dots just mean continue on, etc) output file Code:
ID = TESTTEST|ID = TESTTEST Code:
1// JOB TSSLIST *** TSS INIT COMMANDS *** DATE 12/04/2010, CLOCK 11/36/05 |
This should do the job, and also split each fact on a separate line. It does a bit more than you asked for, but I guess this is what I'd start with.
Code:
tr -s '\r\n' '\n\n' < infile | sed -e 's|[\t\v\f ]*=[\t\v\f ]*|=|g; s|[\t\v\f ][\t\v\f ]\+\([^\t\v\f =]\+\([\t\v\f ][^\t\v\f =]\+\)*=\)|\n\1|g' | sed -e 's|^[\t\v\f ]\+||; s|[\t\v\f ]\+$||; s|[\t\v\f ]\+)|)|g; s|[\t\v\f ][\t\v\f ]\+| |g' | awk '/^ID=/ { id=$0 ; print id ; next } /=/ && length(id) { print id "|" $0 }' >outfile The first sed removes whitespace around equals signs. Also, if there are multiple consecutive whitespaces, followed by some term (which may contain nonconsecutive whitespaces) and an equals sign, it splits the line at the consecutive whitespace. This makes sure each fact is on its own line. The second sed removes leading and trailing whitespace, all whitespace before a close parenthesis, and combines multiple consecutive whitespaces into one. (Because the first sed introduces new newlines, I find it is easiest to flatten the data stream by using a separate sed command. It makes it easier to develop such long pipe stanzas.) The awk part picks the ID values (also printing them alone), and for any line containing an equals sign, prints the id and the line. If you do not need the ID alone, just omit the first print . If the input may contain pipes, I recommend prepending s/|/!/g; to the first sed pattern. If you prefer the whitespace around = and |, add | sed -e 's/\([|=]\)/ \1 /g' just before the >outfile . To see which input lines are ignored/omitted by the above command, replace the end, starting at awk, with grep -v -e '=' -e '^[\t\v\f ]*$' |
Nominal,
i will try that. thnx. |
All times are GMT -5. The time now is 06:18 AM. |