LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   CSV | Text manipulation (https://www.linuxquestions.org/questions/programming-9/csv-%7C-text-manipulation-659220/)

lmedland 07-30-2008 05:34 AM

CSV | Text manipulation
 
Hi,

This forum is awesome, has helped me accomplish some data manipulation things really quickly!

I have another question now, not sure if gawk can be used for this or not....

I need to convert an alpha code in a csv file to an alpha-numeric code based on the following criteria:-
Code:

If field starts with N then new code starts with 01
If field starts with C then new code starts with 02
If field starts with L then new code starts with 03

THEN

If the conditions above are true then take three characters after the first character of the original code and add this to the 01,02 or 03 above.

THEN

Add 01 on the end.

If the field starts with something other than N, C or L then leave the field intact.

An example:

NBUP would become 01BUP01
CBUP would become 02BUP01
NWAR would become 01WAR01
012322 would become 012322

The format of the original csv file would be as follows and would require an additional field for the new codes:-

Code:

ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322

So the output when directed to a csv file, would look like

Code:

ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322

Thanks so much for help. I have started to read about awk arrays but certainly not at the standard required to do the job above!

burschik 07-30-2008 05:54 AM

sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'

radoulov 07-30-2008 06:11 AM

This is for GNU Awk (otherwise you should escape the new lines in the ternary operator):

Code:

awk>new.csv -F', *' 'BEGIN {
  n = split("N C L", t, OFS)
  while (++i <= n) tt[t[i]] = sprintf("%02d", i)
  c = "01"
  }
NR == 1 { $(NF + 1) = "New code"; print; next }
{ $(NF + 1) = $NF ~ /^[NCL].*/ ?
    tt[substr($NF, 1, 1)] substr($NF, 2, 3) c :
      $NF }
1'  OFS=', ' filename


lmedland 07-30-2008 06:17 AM

Quote:

Originally Posted by burschik (Post 3230785)
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'

Thanks. I have never used sed before, how do issue the command with the csv file.

I tried...
Code:

sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' input.csv > output.csv
But it didn't work.

Thanks

burschik 07-30-2008 06:27 AM

It should work like that. What exactly does not work?

ghostdog74 07-30-2008 06:41 AM

if you have Python, here's an alternative
Code:

for line in open("file"):
    line=line.strip() #strip new line
    code=line.split(",")[2].strip() #get the code
    if code.startswith("N"):
        newcode="01"+code[1:]+"01"
        print line+", "+newcode
    elif code.startswith("C"):
        newcode="02"+code[1:]+"01"
        print line+", "+newcode
    elif code.startswith("L"):
        newcode="03"+code[1:]+"01"
        print line+", "+newcode
    else:
        newcode=""
        print line


lmedland 07-30-2008 08:11 AM

Quote:

Originally Posted by burschik (Post 3230816)
It should work like that. What exactly does not work?

Code:

sed: -e expression #1, char 91: extra characters after command

burschik 07-30-2008 09:12 AM

That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?

lmedland 07-30-2008 10:01 AM

Quote:

Originally Posted by burschik (Post 3230984)
That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?

I ran the following..no error message and output file is generated but looks identical to the input file.

[HTML]sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' '/home/ll/Desktop/Extracts/Result/Prime/in.csv' > out.csv[/HTML]

Not sure what you mean about the placement of quotation marks, I just copied and pasted your command.

Any ideas?

Thanks for helping...

burschik 07-30-2008 10:30 AM

No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.

lmedland 07-30-2008 11:19 AM

Quote:

Originally Posted by burschik (Post 3231074)
No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.

I have no idea with Python, so will need to look into this.

Thanks for your help though.

ghostdog74 07-30-2008 11:28 AM

sed is definitely not a suitable tool to do your stuff.
why didn't you try radoulov's awk solution? Doesn't it work?
If you want to try the Python piece, save the code as as script and on command prompt
Code:

# python script.py

Mr. C. 07-30-2008 02:06 PM

And yet another way:
Code:

$ cat data
ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322

$ perl -lna -F'/,\s*/' -e 'BEGIN {$" = ", "; %C=(N =>"01",C=>"02",L=>"03")}; $F[3]=$F[0] eq "ID" ? "New code" : $F[2] =~ /^([NCL])(\w+)/ ? "$C{$1}${2}01" : $F[2]; print "@F"' data
ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322


burschik 07-30-2008 11:18 PM

ghostdog74 wrote:

Quote:

sed is definitely not a suitable tool to do your stuff.
Would you please be so kind as to explain this pearl of wisdom.

Mr. C. 07-31-2008 12:09 AM

Sed can do just about anything, but is very cumbersome to use for larger tasks. It was superb for its time, and still has plenty of value. But who the heck wants to fight with a basic two register machine!

I think this was the point being made.


All times are GMT -5. The time now is 06:52 AM.