Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
07-30-2008, 05:34 AM
|
#1
|
|
LQ Newbie
Registered: Jun 2008
Location: England
Posts: 21
Rep:
|
CSV | Text manipulation
Hi,
This forum is awesome, has helped me accomplish some data manipulation things really quickly!
I have another question now, not sure if gawk can be used for this or not....
I need to convert an alpha code in a csv file to an alpha-numeric code based on the following criteria:-
Code:
If field starts with N then new code starts with 01
If field starts with C then new code starts with 02
If field starts with L then new code starts with 03
THEN
If the conditions above are true then take three characters after the first character of the original code and add this to the 01,02 or 03 above.
THEN
Add 01 on the end.
If the field starts with something other than N, C or L then leave the field intact.
An example:
NBUP would become 01BUP01
CBUP would become 02BUP01
NWAR would become 01WAR01
012322 would become 012322
The format of the original csv file would be as follows and would require an additional field for the new codes:-
Code:
ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322
So the output when directed to a csv file, would look like
Code:
ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322
Thanks so much for help. I have started to read about awk arrays but certainly not at the standard required to do the job above!
|
|
|
|
07-30-2008, 05:54 AM
|
#2
|
|
Member
Registered: Jul 2008
Posts: 159
Rep:
|
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'
|
|
|
|
07-30-2008, 06:11 AM
|
#3
|
|
Member
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212
Rep:
|
This is for GNU Awk (otherwise you should escape the new lines in the ternary operator):
Code:
awk>new.csv -F', *' 'BEGIN {
n = split("N C L", t, OFS)
while (++i <= n) tt[t[i]] = sprintf("%02d", i)
c = "01"
}
NR == 1 { $(NF + 1) = "New code"; print; next }
{ $(NF + 1) = $NF ~ /^[NCL].*/ ?
tt[substr($NF, 1, 1)] substr($NF, 2, 3) c :
$NF }
1' OFS=', ' filename
|
|
|
|
07-30-2008, 06:17 AM
|
#4
|
|
LQ Newbie
Registered: Jun 2008
Location: England
Posts: 21
Original Poster
Rep:
|
Quote:
Originally Posted by burschik
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;'
|
Thanks. I have never used sed before, how do issue the command with the csv file.
I tried...
Code:
sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' input.csv > output.csv
But it didn't work.
Thanks
|
|
|
|
07-30-2008, 06:27 AM
|
#5
|
|
Member
Registered: Jul 2008
Posts: 159
Rep:
|
It should work like that. What exactly does not work?
|
|
|
|
07-30-2008, 06:41 AM
|
#6
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
if you have Python, here's an alternative
Code:
for line in open("file"):
line=line.strip() #strip new line
code=line.split(",")[2].strip() #get the code
if code.startswith("N"):
newcode="01"+code[1:]+"01"
print line+", "+newcode
elif code.startswith("C"):
newcode="02"+code[1:]+"01"
print line+", "+newcode
elif code.startswith("L"):
newcode="03"+code[1:]+"01"
print line+", "+newcode
else:
newcode=""
print line
|
|
|
|
07-30-2008, 08:11 AM
|
#7
|
|
LQ Newbie
Registered: Jun 2008
Location: England
Posts: 21
Original Poster
Rep:
|
Quote:
Originally Posted by burschik
It should work like that. What exactly does not work?
|
Code:
sed: -e expression #1, char 91: extra characters after command
|
|
|
|
07-30-2008, 09:12 AM
|
#8
|
|
Member
Registered: Jul 2008
Posts: 159
Rep:
|
That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?
|
|
|
|
07-30-2008, 10:01 AM
|
#9
|
|
LQ Newbie
Registered: Jun 2008
Location: England
Posts: 21
Original Poster
Rep:
|
Quote:
Originally Posted by burschik
That is slightly surprising, since the expression is less than 91 characters long. Are you sure that your placement of quotation marks is correct?
|
I ran the following..no error message and output file is generated but looks identical to the input file.
[HTML]sed 's/, N\([A-Z]\+\)/, 01\101/; s/, C\([A-Z]\+\)/, 02\101/; s/, L\([A-Z]\+\)/, 03\101/;' '/home/ll/Desktop/Extracts/Result/Prime/in.csv' > out.csv[/HTML]
Not sure what you mean about the placement of quotation marks, I just copied and pasted your command.
Any ideas?
Thanks for helping...
|
|
|
|
07-30-2008, 10:30 AM
|
#10
|
|
Member
Registered: Jul 2008
Posts: 159
Rep:
|
No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.
|
|
|
|
07-30-2008, 11:19 AM
|
#11
|
|
LQ Newbie
Registered: Jun 2008
Location: England
Posts: 21
Original Poster
Rep:
|
Quote:
Originally Posted by burschik
No, sorry, I don't see what might be going wrong. It works for me (TM). But since the sed one-liner isn't exactly what you wanted anyhow, you would probably be better off using one of the other suggestions.
|
I have no idea with Python, so will need to look into this.
Thanks for your help though.
|
|
|
|
07-30-2008, 11:28 AM
|
#12
|
|
Senior Member
Registered: Aug 2006
Posts: 2,695
|
sed is definitely not a suitable tool to do your stuff.
why didn't you try radoulov's awk solution? Doesn't it work?
If you want to try the Python piece, save the code as as script and on command prompt
|
|
|
|
07-30-2008, 02:06 PM
|
#13
|
|
Senior Member
Registered: Jun 2008
Posts: 2,529
Rep:
|
And yet another way:
Code:
$ cat data
ID, Amount, Old code
654, 45.00, NBUP
5432, 20.00, CBUP
42, 65.00, NWAR
442, 66.00, 012322
$ perl -lna -F'/,\s*/' -e 'BEGIN {$" = ", "; %C=(N =>"01",C=>"02",L=>"03")}; $F[3]=$F[0] eq "ID" ? "New code" : $F[2] =~ /^([NCL])(\w+)/ ? "$C{$1}${2}01" : $F[2]; print "@F"' data
ID, Amount, Old code, New code
654, 45.00, NBUP, 01BUP01
5432, 20.00, CBUP, 02BUP01
42, 65.00, NWAR, 01WAR01
442, 66.00, 012322, 012322
|
|
|
|
07-30-2008, 11:18 PM
|
#14
|
|
Member
Registered: Jul 2008
Posts: 159
Rep:
|
ghostdog74 wrote:
Quote:
|
sed is definitely not a suitable tool to do your stuff.
|
Would you please be so kind as to explain this pearl of wisdom.
|
|
|
|
07-31-2008, 12:09 AM
|
#15
|
|
Senior Member
Registered: Jun 2008
Posts: 2,529
Rep:
|
Sed can do just about anything, but is very cumbersome to use for larger tasks. It was superb for its time, and still has plenty of value. But who the heck wants to fight with a basic two register machine!
I think this was the point being made.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 11:10 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|