LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 04-07-2011, 03:54 PM   #1
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Rep: Reputation: 27
sed/awk help


how would i create a sed or awk command that would add a | to the end of a line if it wasn't already there?
i'm trying to import some files into mysql, but some of the lines to not end with a | which is my field delmiter.
any help would be great.
thanks
 
Old 04-07-2011, 04:23 PM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
That one's quite easy:

Code:
sed 's/\([^|]\)$/\1|/' input.txt
's' means replace

'[|]' would mean "match a |" but the '^' inverts the match, so '[^|]' means "match anything EXCEPT a |"

'$' matches the end of a line

'\1' is replaced by everything inside the brackets in the first half of the expression.

So... it means "find a line which ends with a character that isn't a |, and replace it with that character and a |"

Last edited by Snark1994; 04-07-2011 at 04:26 PM.
 
Old 04-07-2011, 04:43 PM   #3
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Original Poster
Rep: Reputation: 27
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.


edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.
 
Old 04-07-2011, 05:22 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,390
Blog Entries: 2

Rep: Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900Reputation: 900
You must have edited the file at some point with a DOS/Windows style editor. The ^M's are carriage-returns, and they are messing with the Linux style delimiters which are linefeeds alone. You should find a tool that converts DOS text files to Unix text files, and then try the sed script against the result.
Google says the following should work, and without having actually tried it, it looks about right:
Code:
tr -d '\r' < dosfile > unixfile
--- rod.

Last edited by theNbomr; 04-07-2011 at 07:13 PM.
 
Old 04-07-2011, 07:08 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,950
Blog Entries: 11

Rep: Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860
Quote:
Originally Posted by Eppo View Post
hmm i'm getting pipes where i shouldn't, but it may have to do with control characters.
whats the best way to view a document and see the control characters, if i use vi, i see some ^M but i dont think its showing the carriage return line feeds.


edit: there are some spaces that need to be removed first, i want to find out what control characters they contain.
To see what other special characters may be hidden in there,
have a look at
Code:
od -a <file>

Cheers,
Tink
 
Old 04-07-2011, 07:47 PM   #6
kurumi
Member
 
Registered: Apr 2010
Posts: 223

Rep: Reputation: 45
Code:
$ ruby -pne '$_.chomp!; $_.gsub!(/$/,"|\n") if !/\|$/' file
 
Old 04-07-2011, 08:29 PM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,202

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
I am a little curious why you would need a delimiter after the last entry?
Or are you planning on filling a field with NULL data?
 
Old 04-07-2011, 09:38 PM   #8
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Original Poster
Rep: Reputation: 27
the lines are actually multiple lines long. its an HL7 file... here is an example
MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4<cr>
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^
^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520<cr>
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730|||||||||
555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD<cr>
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr>

my issue is the line that starts with PID sometimes has a | at the end and sometimes not, i want to make sure they all do so i can count my fields correctly.
 
Old 04-07-2011, 09:51 PM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,202

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
So then is it correct to say that it is only lines that do not have '<cr>' at the end should be checked to see they end in a pipe?
 
Old 04-08-2011, 08:06 AM   #10
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Original Poster
Rep: Reputation: 27
yes, i think so. although i'm not sure if this is going to work out the way i thought it would because every line may not have the same amount of fields.
i'll cross that bridge when i come to it though.
 
Old 04-08-2011, 11:22 AM   #11
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Original Poster
Rep: Reputation: 27
ok, so the last one worked, so what i'm left with is this:
PID|1||0394580|0394580|Yogy Bear ||20070608|M|U||485 linux road, slackware, ny 11722|6316172045|
PID|1||31375|31375|Fozzy bear ||19890113|F|U||16 gentoo road, slackware, ny 11720 |

so i want to pick up the second line, and not the first. if i try something like this it doesnt work:
sed 's/\(PID|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|.*|$\)/\1|/' winds up adding the | to both lines.
how to i pick up the second line, but not the first?
 
Old 04-08-2011, 11:45 AM   #12
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,202

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Not sure if I am following, but is this what you are after:
Code:
sed '/^PID/s/[^|]$/|/' file
 
Old 04-08-2011, 01:35 PM   #13
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Are you saying you want to append '|' to the end of the line until there are 13 of them in the line (ie. 13 fields)? 'cos that's the only explanation I can come to which is consistent with your latest post...
 
Old 04-08-2011, 02:24 PM   #14
Eppo
Member
 
Registered: Feb 2007
Location: NY
Distribution: Arch, Ubuntu
Posts: 77

Original Poster
Rep: Reputation: 27
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.
 
Old 04-08-2011, 02:41 PM   #15
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713Reputation: 713
Quote:
Originally Posted by Eppo View Post
yes, i want to make sure that all of the PID fields have the same amount of |, if a line only has 12 i want to add one to the end.
So basically what you want is for a 13th "|" character to be added to the end of each line that has only 12 of them?
 
  


Reply

Tags
od, sed, tr


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed or awk help sharky Programming 13 03-02-2010 05:17 PM
need help with sed/awk jbeiter Programming 7 02-26-2010 11:13 AM
Help with awk or sed. tuxtutorials Linux - Software 1 07-23-2009 02:45 AM
sed or awk ilo Programming 1 08-22-2008 10:38 AM
Sed and Awk Gins Programming 7 04-19-2006 10:32 AM


All times are GMT -5. The time now is 11:06 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration