Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
04-07-2011, 03:54 PM
|
#1
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Rep:
|
sed/awk help
how would i create a sed or awk command that would add a | to the end of a line if it wasn't already there?
i'm trying to import some files into mysql, but some of the lines to not end with a | which is my field delmiter.
any help would be great.
thanks
|
|
|
04-07-2011, 04:23 PM
|
#2
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
That one's quite easy:
Code:
sed 's/\([^|]\)$/\1|/' input.txt
's' means replace
'[|]' would mean "match a |" but the '^' inverts the match, so '[^|]' means "match anything EXCEPT a |"
'$' matches the end of a line
'\1' is replaced by everything inside the brackets in the first half of the expression.
So... it means "find a line which ends with a character that isn't a |, and replace it with that character and a |" 
Last edited by Snark1994; 04-07-2011 at 04:26 PM.
|
|
|
04-07-2011, 04:43 PM
|
#3
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Original Poster
Rep:
|
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.
edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.
|
|
|
04-07-2011, 05:22 PM
|
#4
|
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
|
You must have edited the file at some point with a DOS/Windows style editor. The ^M's are carriage-returns, and they are messing with the Linux style delimiters which are linefeeds alone. You should find a tool that converts DOS text files to Unix text files, and then try the sed script against the result.
Google says the following should work, and without having actually tried it, it looks about right:
Code:
tr -d '\r' < dosfile > unixfile
--- rod.
Last edited by theNbomr; 04-07-2011 at 07:13 PM.
|
|
|
04-07-2011, 07:08 PM
|
#5
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by Eppo
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.
edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.
|
To see what other special characters may be hidden in there,
have a look at
Cheers,
Tink
|
|
|
04-07-2011, 07:47 PM
|
#6
|
Member
Registered: Apr 2010
Posts: 228
Rep:
|
Code:
$ ruby -pne '$_.chomp!; $_.gsub!(/$/,"|\n") if !/\|$/' file
|
|
|
04-07-2011, 08:29 PM
|
#7
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
I am a little curious why you would need a delimiter after the last entry?
Or are you planning on filling a field with NULL data?
|
|
|
04-07-2011, 09:38 PM
|
#8
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Original Poster
Rep:
|
the lines are actually multiple lines long. its an HL7 file... here is an example
MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4<cr>
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^
^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520<cr>
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730|||||||||
555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD<cr>
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr>
my issue is the line that starts with PID sometimes has a | at the end and sometimes not, i want to make sure they all do so i can count my fields correctly.
|
|
|
04-07-2011, 09:51 PM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
So then is it correct to say that it is only lines that do not have '<cr>' at the end should be checked to see they end in a pipe?
|
|
|
04-08-2011, 08:06 AM
|
#10
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Original Poster
Rep:
|
yes, i think so. although i'm not sure if this is going to work out the way i thought it would because every line may not have the same amount of fields.
i'll cross that bridge when i come to it though.
|
|
|
04-08-2011, 11:22 AM
|
#11
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Original Poster
Rep:
|
ok, so the last one worked, so what i'm left with is this:
PID|1||0394580|0394580|Yogy Bear ||20070608|M|U||485 linux road, slackware, ny 11722|6316172045|
PID|1||31375|31375|Fozzy bear ||19890113|F|U||16 gentoo road, slackware, ny 11720 |
so i want to pick up the second line, and not the first. if i try something like this it doesnt work:
sed 's/\(PID|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|$\)/\1|/' winds up adding the | to both lines.
how to i pick up the second line, but not the first?
|
|
|
04-08-2011, 11:45 AM
|
#12
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Not sure if I am following, but is this what you are after:
Code:
sed '/^PID/s/[^|]$/|/' file
|
|
|
04-08-2011, 01:35 PM
|
#13
|
Senior Member
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
|
Are you saying you want to append '|' to the end of the line until there are 13 of them in the line (ie. 13 fields)? 'cos that's the only explanation I can come to which is consistent with your latest post...
|
|
|
04-08-2011, 02:24 PM
|
#14
|
Member
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77
Original Poster
Rep:
|
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.
|
|
|
04-08-2011, 02:41 PM
|
#15
|
LQ 5k Club
Registered: Sep 2009
Posts: 6,443
|
Quote:
Originally Posted by Eppo
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.
|
So basically what you want is for a 13th "|" character to be added to the end of each line that has only 12 of them?
|
|
|
All times are GMT -5. The time now is 06:25 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|