LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   sed/awk help (http://www.linuxquestions.org/questions/programming-9/sed-awk-help-873651/)

Eppo 04-07-2011 03:54 PM

sed/awk help
 
how would i create a sed or awk command that would add a | to the end of a line if it wasn't already there?
i'm trying to import some files into mysql, but some of the lines to not end with a | which is my field delmiter.
any help would be great.
thanks

Snark1994 04-07-2011 04:23 PM

That one's quite easy:

Code:

sed 's/\([^|]\)$/\1|/' input.txt
's' means replace

'[|]' would mean "match a |" but the '^' inverts the match, so '[^|]' means "match anything EXCEPT a |"

'$' matches the end of a line

'\1' is replaced by everything inside the brackets in the first half of the expression.

So... it means "find a line which ends with a character that isn't a |, and replace it with that character and a |" :)

Eppo 04-07-2011 04:43 PM

hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.


edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.

theNbomr 04-07-2011 05:22 PM

You must have edited the file at some point with a DOS/Windows style editor. The ^M's are carriage-returns, and they are messing with the Linux style delimiters which are linefeeds alone. You should find a tool that converts DOS text files to Unix text files, and then try the sed script against the result.
Google says the following should work, and without having actually tried it, it looks about right:
Code:

tr -d '\r' < dosfile > unixfile
--- rod.

Tinkster 04-07-2011 07:08 PM

Quote:

Originally Posted by Eppo (Post 4317436)
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.


edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.

To see what other special characters may be hidden in there,
have a look at
Code:

od -a <file>

Cheers,
Tink

kurumi 04-07-2011 07:47 PM

Code:

$ ruby -pne '$_.chomp!; $_.gsub!(/$/,"|\n") if !/\|$/' file

grail 04-07-2011 08:29 PM

I am a little curious why you would need a delimiter after the last entry?
Or are you planning on filling a field with NULL data?

Eppo 04-07-2011 09:38 PM

the lines are actually multiple lines long. its an HL7 file... here is an example
MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4<cr>
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^
^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520<cr>
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730|||||||||
555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD<cr>
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr>

my issue is the line that starts with PID sometimes has a | at the end and sometimes not, i want to make sure they all do so i can count my fields correctly.

grail 04-07-2011 09:51 PM

So then is it correct to say that it is only lines that do not have '<cr>' at the end should be checked to see they end in a pipe?

Eppo 04-08-2011 08:06 AM

yes, i think so. although i'm not sure if this is going to work out the way i thought it would because every line may not have the same amount of fields.
i'll cross that bridge when i come to it though.

Eppo 04-08-2011 11:22 AM

ok, so the last one worked, so what i'm left with is this:
PID|1||0394580|0394580|Yogy Bear ||20070608|M|U||485 linux road, slackware, ny 11722|6316172045|
PID|1||31375|31375|Fozzy bear ||19890113|F|U||16 gentoo road, slackware, ny 11720 |

so i want to pick up the second line, and not the first. if i try something like this it doesnt work:
sed 's/\(PID|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|$\)/\1|/' winds up adding the | to both lines.
how to i pick up the second line, but not the first?

grail 04-08-2011 11:45 AM

Not sure if I am following, but is this what you are after:
Code:

sed '/^PID/s/[^|]$/|/' file

Snark1994 04-08-2011 01:35 PM

Are you saying you want to append '|' to the end of the line until there are 13 of them in the line (ie. 13 fields)? 'cos that's the only explanation I can come to which is consistent with your latest post...

Eppo 04-08-2011 02:24 PM

yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.

MTK358 04-08-2011 02:41 PM

Quote:

Originally Posted by Eppo (Post 4318336)
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.

So basically what you want is for a 13th "|" character to be added to the end of each line that has only 12 of them?


All times are GMT -5. The time now is 01:53 AM.