sed/awk help
how would i create a sed or awk command that would add a | to the end of a line if it wasn't already there?
i'm trying to import some files into mysql, but some of the lines to not end with a | which is my field delmiter. any help would be great. thanks |
That one's quite easy:
Code:
sed 's/\([^|]\)$/\1|/' input.txt '[|]' would mean "match a |" but the '^' inverts the match, so '[^|]' means "match anything EXCEPT a |" '$' matches the end of a line '\1' is replaced by everything inside the brackets in the first half of the expression. So... it means "find a line which ends with a character that isn't a |, and replace it with that character and a |" :) |
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds. edit: there are some spaces that need to be removed first, i want to find out what control characters they contain. |
You must have edited the file at some point with a DOS/Windows style editor. The ^M's are carriage-returns, and they are messing with the Linux style delimiters which are linefeeds alone. You should find a tool that converts DOS text files to Unix text files, and then try the sed script against the result.
Google says the following should work, and without having actually tried it, it looks about right: Code:
tr -d '\r' < dosfile > unixfile |
Quote:
have a look at Code:
od -a <file> Cheers, Tink |
Code:
$ ruby -pne '$_.chomp!; $_.gsub!(/$/,"|\n") if !/\|$/' file |
I am a little curious why you would need a delimiter after the last entry?
Or are you planning on filling a field with NULL data? |
the lines are actually multiple lines long. its an HL7 file... here is an example
MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4<cr> PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^ ^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520<cr> OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730||||||||| 555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD<cr> OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr> my issue is the line that starts with PID sometimes has a | at the end and sometimes not, i want to make sure they all do so i can count my fields correctly. |
So then is it correct to say that it is only lines that do not have '<cr>' at the end should be checked to see they end in a pipe?
|
yes, i think so. although i'm not sure if this is going to work out the way i thought it would because every line may not have the same amount of fields.
i'll cross that bridge when i come to it though. |
ok, so the last one worked, so what i'm left with is this:
PID|1||0394580|0394580|Yogy Bear ||20070608|M|U||485 linux road, slackware, ny 11722|6316172045| PID|1||31375|31375|Fozzy bear ||19890113|F|U||16 gentoo road, slackware, ny 11720 | so i want to pick up the second line, and not the first. if i try something like this it doesnt work: sed 's/\(PID|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|$\)/\1|/' winds up adding the | to both lines. how to i pick up the second line, but not the first? |
Not sure if I am following, but is this what you are after:
Code:
sed '/^PID/s/[^|]$/|/' file |
Are you saying you want to append '|' to the end of the line until there are 13 of them in the line (ie. 13 fields)? 'cos that's the only explanation I can come to which is consistent with your latest post...
|
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.
|
Quote:
|
All times are GMT -5. The time now is 07:08 PM. |