LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-20-2010, 08:56 PM   #1
nilleso
Member
 
Registered: Nov 2004
Location: ON, CANADA
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372

Rep: Reputation: 31
Help with simple CSV parsing


Hello. I have a well formated source file that needs some simple parsing but I'm not sure of the best/easiest method.

example source,
Quote:
1,22,333,44,5555,6,77777777777777777777,8,9999,00000000000000,111111
1,22,333,44,5555,6,77777777777777777777,8,9999,00000000000000,111111
1,22,333,44,55555,6,77777777777777777,8,9999,0000000000000000,1111111
1,22,333,44,555,6,777777777777777777,8,9999,00000000000000,111111
I'd like print fields:
Quote:
$1,$2,$3,$4,$5,$7(but replace all characters after the first 8 with "X"),$9,$10(but only the first 5 characters,$11
really appreciate your help!!
thanks

Last edited by nilleso; 09-20-2010 at 08:57 PM.
 
Old 09-20-2010, 09:36 PM   #2
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
It's easiest to script it in Python.

http://docs.python.org/library/csv.html
 
Old 09-20-2010, 09:48 PM   #3
nilleso
Member
 
Registered: Nov 2004
Location: ON, CANADA
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372

Original Poster
Rep: Reputation: 31
I wish I knew python. Can you offer any example code to solve this scenario?
thanks
 
Old 09-21-2010, 08:41 PM   #4
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
The link I posted includes example code.
 
Old 09-21-2010, 09:16 PM   #5
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Code:
$ ruby -F"," -ane '$F.delete_at(5);$F[5][7..-1]=$F[5][7..-1].gsub(/./,"X");$F[8]=$F[8][0,5];puts $F.join(",")' file
1,22,333,44,5555,7777777XXXXXXXXXXXXX,8,9999,00000,111111
1,22,333,44,5555,7777777XXXXXXXXXXXXX,8,9999,00000,111111
1,22,333,44,55555,7777777XXXXXXXXXX,8,9999,00000,1111111
1,22,333,44,555,7777777XXXXXXXXXXX,8,9999,00000,111111
 
Old 09-21-2010, 09:44 PM   #6
14moose
Member
 
Registered: May 2010
Posts: 83

Rep: Reputation: Disabled
Q: which scripting language(s) do you currently feel comfortable with?

Bash? Perl? Something else entirely?
 
Old 09-22-2010, 09:27 AM   #7
nilleso
Member
 
Registered: Nov 2004
Location: ON, CANADA
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372

Original Poster
Rep: Reputation: 31
@kurumi - thanks a lot. Unfortunately I do not have ruby installed on the target machine and do not have the option of installing. I will try to check out your example on an alternate machine though and consider moving the data files across. appreciate the example!!
 
Old 09-22-2010, 09:32 AM   #8
nilleso
Member
 
Registered: Nov 2004
Location: ON, CANADA
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372

Original Poster
Rep: Reputation: 31
Hi 14moose
I can hack together a bash/sed/awk/tr script... but having difficulty with this scenario of:
1- print entire comma delimited lines, while
2- reducing number of characters on some of the fields, and
3- replacing some of the characters in specific fields with x's

appreciate any suggestions or alternatives!! thanks
 
Old 09-22-2010, 10:52 AM   #9
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Using GNU sed:
Code:
echo '
1,22,333,44,5555,6,77777777777777777777,8,9999,00000000000000,111111
1,22,333,44,5555,6,77777777777777777777,8,9999,00000000000000,111111
1,22,333,44,55555,6,77777777777777777,8,9999,0000000000000000,1111111
1,22,333,44,555,6,777777777777777777,8,9999,00000000000000,111111' |
sed -r 's/^(([^,]*,){5})[^,]*,([^,]*,)[^,]*,/\1\3/
s/^(([^,]*,){7}[^,]{5})[^,]*/\1/
:a s/^(([^,]*,){5}[^,]{8}X*)[^X,]/\1X/; ta'

1,22,333,44,5555,77777777XXXXXXXXXXXX,9999,00000,111111
1,22,333,44,5555,77777777XXXXXXXXXXXX,9999,00000,111111
1,22,333,44,55555,77777777XXXXXXXXX,9999,00000,1111111
1,22,333,44,555,77777777XXXXXXXXXX,9999,00000,111111
The sed lines:
1. Remove $6 and $8.
2. Shorten $10 (now $8) to 5 characters.
3. Replace all characters after the first 8 in $7 (now $6) with "X".
 
Old 09-22-2010, 06:48 PM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by nilleso View Post
but I'm not sure of the best/easiest method.
if you have access to CSV parsers like those Python/Perl/Ruby provides, then try to use them. They can parser csv files easily and can take care of issues like embedded commas etc. If you don't have access to such languages or download stuff, your best bet (more appropriate) would normal *nix tools like awk/gawk and shell programming (sed excluded ,not because it can't do the job, but more because you will end up a messy chunk of unreadable regex).
Code:
#!/bin/bash

awk -F"," '{
 $6=$8=""
 a=substr($7,7)
 gsub(/./,"X",a)
 $7=substr($7,0,7) a
 $10=substr($10,0,5)
}1' OFS="," file

Last edited by ghostdog74; 09-22-2010 at 06:50 PM.
 
1 members found this post helpful.
Old 09-22-2010, 11:23 PM   #11
nilleso
Member
 
Registered: Nov 2004
Location: ON, CANADA
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372

Original Poster
Rep: Reputation: 31
Thank you everyone for the great suggestions. ghostdog74's awk construct works well but I plan to check out the alternatives also.

Thanks again
cheers
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL to CSV; simple question? donnied Programming 2 07-01-2009 03:45 PM
Parsing a comma separated CSV file where fields have commas in to trickyflash Linux - General 7 03-26-2009 03:30 PM
Parsing a pseudo CSV file. sharky Programming 8 11-03-2008 10:47 PM
Parsing XLS or CSV in Perl - what and what not me_the_apprentice Programming 13 02-28-2006 03:44 AM
simple parsing question thanhvn Programming 4 01-31-2006 08:05 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration