LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-04-2003, 09:32 AM   #1
jajanes
LQ Newbie
 
Registered: Jul 2002
Posts: 5

Rep: Reputation: 0
Parsing a tab delimited text file


I'm currently writing a bash shell script that needs to take a tab delimited text file and convert it into a MySQL importable file. I have no experience with gawk (which is what I'm assuming I'd use - if not, please don't hesitate to correct me) - but this is what I'm looking to do:

Original File (the <TAB> is just representative of an actual tab):

1123432<TAB>114 Oceanside Drive<TAB>3324|4432|4432|2234<TAB>11234.jpg


Converting to:

"1123432","114 Oceanside Drive","3324|4432|4432|2234","11234.jpg"


Any help would be GREATLY appreciated - kind of stuck on this one. I'm reading the gawk man page, but it's not really linking in without "sample code".

Thanks!
 
Old 08-04-2003, 10:13 AM   #2
kev82
Senior Member
 
Registered: Apr 2003
Location: Lancaster, England
Distribution: Debian Etch, OS X 10.4
Posts: 1,263

Rep: Reputation: 50
how about:

cat datafile | sed -e 's/^./"&/' -e 's/.$/&"/' -e 's/large space/\",\"/g' > outfile

the large space in the last sed is a literal tab produced by pressing C-v tab at the shell prompt. im sure it can be tidied up as im not a sed expert.

Last edited by kev82; 08-04-2003 at 10:28 AM.
 
Old 08-04-2003, 10:18 AM   #3
Sliptwixt
LQ Newbie
 
Registered: Aug 2003
Posts: 8

Rep: Reputation: 0
hello -
try:

awk '{ print "\""$1"\",\""$2"\",\""$3"\",\""$4"\"" }' myfile.txt
 
Old 08-04-2003, 12:22 PM   #4
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 46
Kev82
That just puts quotes around the whole thing.
"1123432 114 Oceanside drive 3324|4432|4432|2234 11234.jpg"
Sliptwixt that puts quotes around the first 4 fields.
"1123432" "114" "Oceanside" "Drive"

Use [[:cntrl:]] with sed or grep to look for control characters. If <Tab> is the only control character then something like this would work:
sed -e's/[[:cntrl:]]/\"/g'
That would replace the three <Tab's> with a " but unfortunately I'm sure there are other return characters, like linefeeds and or carrage returns.

I think Chr$(9) is the tab. So you need to make that the field delimiter.
 
Old 08-04-2003, 12:31 PM   #5
kev82
Senior Member
 
Registered: Apr 2003
Location: Lancaster, England
Distribution: Debian Etch, OS X 10.4
Posts: 1,263

Rep: Reputation: 50
Quote:
That just puts quotes around the whole thing.
it works fine for me, are you sure you did C-v tab to insert a literal tab character?
Quote:
that puts quotes around the first 4 fields.
this also works fine for me although i agree it does rely on there being four fields.
 
Old 08-04-2003, 03:11 PM   #6
Sliptwixt
LQ Newbie
 
Registered: Aug 2003
Posts: 8

Rep: Reputation: 0
I assumed the format was consistant and this was a quick-and-dirty one time thing. If it were something I'd have to revisit more than once, I'd opt to script it in Perl or something so I have a little more control over inconsistancies in the datafile and/or some kind of error reporting.

I hope you find the solution that works best for you.

Last edited by Sliptwixt; 08-04-2003 at 03:13 PM.
 
Old 08-04-2003, 07:19 PM   #7
jajanes
LQ Newbie
 
Registered: Jul 2002
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks All ...........

I really appreciate all of your help!
 
Old 08-07-2003, 04:14 AM   #8
slapNUT
Member
 
Registered: Jun 2001
Location: Recycle Bin
Distribution: Linux & Everything else on VirtualBox
Posts: 144

Rep: Reputation: 15
hummm...

kev82
That one works for me.

Last edited by slapNUT; 08-07-2003 at 04:16 AM.
 
Old 08-07-2003, 06:45 PM   #9
sk8guitar
Member
 
Registered: Jul 2003
Location: DC
Distribution: mandrake 9.1
Posts: 415

Rep: Reputation: 30
yeah, perl works great for that stuff. thats what i use to deliminate and format all my files to be parsed to sql
 
Old 08-08-2003, 10:34 AM   #10
/bin/bash
Senior Member
 
Registered: Jul 2003
Location: Indiana
Distribution: Mandrake Slackware-current QNX4.25
Posts: 1,802

Rep: Reputation: 46
Yeah, my bad.

Thats what I get for posting in Windows 95. I was using cygwin bash shell and I couldn't get the tab to work. I guess when I cut-n-pasted it into a script I kinda forgot the tab.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help parsing text file scilec Programming 5 12-02-2004 01:00 PM
need help parsing text file airman99 Linux - General 2 10-08-2004 09:09 PM
Parsing large text file with perl smaida Programming 5 09-13-2004 04:33 AM
Parsing Text from a html file. Rezon Programming 6 10-18-2003 12:09 AM
Parsing a file for a string of text jamesmwlv Linux - General 2 12-02-2002 07:13 PM


All times are GMT -5. The time now is 01:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration