LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-30-2008, 05:34 AM   #1
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Rep: Reputation: 15
Question CSV | Text manipulation


Hi,

This forum is awesome, has helped me accomplish some data manipulation things really quickly!

I have another question now, not sure if gawk can be used for this or not....

I need to convert an alpha code in a csv file to an alpha-numeric code based on the following criteria:-
Code:
If field starts with N then new code starts with 01
If field starts with C then new code starts with 02
If field starts with L then new code starts with 03

THEN

If the conditions above are true then take three characters after the first character of the original code and add this to the 01,02 or 03 above.

THEN

Add 01 on the end.

If the field starts with something other than N, C or L then leave the field intact.

An example:

NBUP would become 01BUP01
CBUP would become 02BUP01
NWAR would become 01WAR01
012322 would become 012322
The format of the original csv file would be as follows and would require an additional field for the new codes:-

Code:
ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322
So the output when directed to a csv file, would look like

Code:
ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322
Thanks so much for help. I have started to read about awk arrays but certainly not at the standard required to do the job above!
 
Old 07-30-2008, 05:54 AM   #2
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'
 
Old 07-30-2008, 06:11 AM   #3
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
This is for GNU Awk (otherwise you should escape the new lines in the ternary operator):

Code:
awk>new.csv -F', *' 'BEGIN {
  n = split("N C L", t, OFS)
  while (++i <= n) tt[t[i]] = sprintf("%02d", i)
  c = "01"
  }
NR == 1 { $(NF + 1) = "New code"; print; next }
{ $(NF + 1) = $NF ~ /^[NCL].*/ ? 
    tt[substr($NF, 1, 1)] substr($NF, 2, 3) c : 
      $NF }
1'  OFS=', ' filename
 
Old 07-30-2008, 06:17 AM   #4
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Question

Quote:
Originally Posted by burschik View Post
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'
Thanks. I have never used sed before, how do issue the command with the csv file.

I tried...
Code:
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' input.csv > output.csv
But it didn't work.

Thanks
 
Old 07-30-2008, 06:27 AM   #5
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
It should work like that. What exactly does not work?
 
Old 07-30-2008, 06:41 AM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
if you have Python, here's an alternative
Code:
for line in open("file"):
    line=line.strip() #strip new line
    code=line.split(",")[2].strip() #get the code
    if code.startswith("N"):
        newcode="01"+code[1:]+"01"
        print line+", "+newcode
    elif code.startswith("C"):
        newcode="02"+code[1:]+"01"
        print line+", "+newcode
    elif code.startswith("L"):
        newcode="03"+code[1:]+"01"
        print line+", "+newcode
    else:
        newcode=""
        print line
 
Old 07-30-2008, 08:11 AM   #7
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by burschik View Post
It should work like that. What exactly does not work?
Code:
sed: -e expression #1, char 91: extra characters after command
 
Old 07-30-2008, 09:12 AM   #8
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?
 
Old 07-30-2008, 10:01 AM   #9
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by burschik View Post
That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?
I ran the following..no error message and output file is generated but looks identical to the input file.

[HTML]sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' '/home/ll/Desktop/Extracts/Result/Prime/in.csv' > out.csv[/HTML]

Not sure what you mean about the placement of quotation marks, I just copied and pasted your command.

Any ideas?

Thanks for helping...
 
Old 07-30-2008, 10:30 AM   #10
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.
 
Old 07-30-2008, 11:19 AM   #11
lmedland
LQ Newbie
 
Registered: Jun 2008
Location: England
Posts: 21

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by burschik View Post
No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.
I have no idea with Python, so will need to look into this.

Thanks for your help though.
 
Old 07-30-2008, 11:28 AM   #12
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
sed is definitely not a suitable tool to do your stuff.
why didn't you try radoulov's awk solution? Doesn't it work?
If you want to try the Python piece, save the code as as script and on command prompt
Code:
# python script.py
 
Old 07-30-2008, 02:06 PM   #13
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
And yet another way:
Code:
$ cat data
ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322

$ perl -lna -F'/,\s*/' -e 'BEGIN {$" = ", "; %C=(N =>"01",C=>"02",L=>"03")}; $F[3]=$F[0] eq "ID" ? "New code" : $F[2] =~ /^([NCL])(\w+)/ ? "$C{$1}${2}01" : $F[2]; print "@F"' data
ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322
 
Old 07-30-2008, 11:18 PM   #14
burschik
Member
 
Registered: Jul 2008
Posts: 159

Rep: Reputation: 31
ghostdog74 wrote:

Quote:
sed is definitely not a suitable tool to do your stuff.
Would you please be so kind as to explain this pearl of wisdom.
 
Old 07-31-2008, 12:09 AM   #15
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 59
Sed can do just about anything, but is very cumbersome to use for larger tasks. It was superb for its time, and still has plenty of value. But who the heck wants to fight with a basic two register machine!

I think this was the point being made.
 
  


Reply

Tags
awk, condition, csv, gawk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Easy string/text manipulation/indentation for restructured text brianmcgee Linux - Software 1 04-22-2008 08:27 PM
.csv file upoload, manipulation, download script help donv2 Programming 6 12-19-2007 03:20 PM
need help with text manipulation pcorajr Programming 12 12-15-2006 07:33 AM
More text manipulation ice_hockey Linux - General 2 05-28-2005 01:43 AM
convert CSV (TEXT) files to UTF-16 cccc Programming 1 07-01-2004 01:54 AM


All times are GMT -5. The time now is 12:05 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration