LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Parsing a tab delimited text file (https://www.linuxquestions.org/questions/programming-9/parsing-a-tab-delimited-text-file-78863/)

jajanes 08-04-2003 09:32 AM

Parsing a tab delimited text file
 
I'm currently writing a bash shell script that needs to take a tab delimited text file and convert it into a MySQL importable file. I have no experience with gawk (which is what I'm assuming I'd use - if not, please don't hesitate to correct me) - but this is what I'm looking to do:

Original File (the <TAB> is just representative of an actual tab):

1123432<TAB>114 Oceanside Drive<TAB>3324|4432|4432|2234<TAB>11234.jpg


Converting to:

"1123432","114 Oceanside Drive","3324|4432|4432|2234","11234.jpg"


Any help would be GREATLY appreciated - kind of stuck on this one. I'm reading the gawk man page, but it's not really linking in without "sample code".

Thanks!

kev82 08-04-2003 10:13 AM

how about:

cat datafile | sed -e 's/^./"&/' -e 's/.$/&"/' -e 's/large space/\",\"/g' > outfile

the large space in the last sed is a literal tab produced by pressing C-v tab at the shell prompt. im sure it can be tidied up as im not a sed expert.

Sliptwixt 08-04-2003 10:18 AM

hello -
try:

awk '{ print "\""$1"\",\""$2"\",\""$3"\",\""$4"\"" }' myfile.txt

/bin/bash 08-04-2003 12:22 PM

Kev82
That just puts quotes around the whole thing.
"1123432 114 Oceanside drive 3324|4432|4432|2234 11234.jpg"
Sliptwixt that puts quotes around the first 4 fields.
"1123432" "114" "Oceanside" "Drive"

Use [[:cntrl:]] with sed or grep to look for control characters. If <Tab> is the only control character then something like this would work:
sed -e's/[[:cntrl:]]/\"/g'
That would replace the three <Tab's> with a " but unfortunately I'm sure there are other return characters, like linefeeds and or carrage returns.

I think Chr$(9) is the tab. So you need to make that the field delimiter.

kev82 08-04-2003 12:31 PM

Quote:

That just puts quotes around the whole thing.
it works fine for me, are you sure you did C-v tab to insert a literal tab character?
Quote:

that puts quotes around the first 4 fields.
this also works fine for me although i agree it does rely on there being four fields.

Sliptwixt 08-04-2003 03:11 PM

I assumed the format was consistant and this was a quick-and-dirty one time thing. If it were something I'd have to revisit more than once, I'd opt to script it in Perl or something so I have a little more control over inconsistancies in the datafile and/or some kind of error reporting.

I hope you find the solution that works best for you.

jajanes 08-04-2003 07:19 PM

Thanks All ...........
 
I really appreciate all of your help!

slapNUT 08-07-2003 04:14 AM

hummm...

kev82
That one works for me.

sk8guitar 08-07-2003 06:45 PM

yeah, perl works great for that stuff. thats what i use to deliminate and format all my files to be parsed to sql

/bin/bash 08-08-2003 10:34 AM

Yeah, my bad.

Thats what I get for posting in Windows 95. I was using cygwin bash shell and I couldn't get the tab to work. I guess when I cut-n-pasted it into a script I kinda forgot the tab. :scratch:


All times are GMT -5. The time now is 08:22 AM.