Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
07-31-2010, 11:00 AM
|
#1
|
LQ Newbie
Registered: Mar 2005
Posts: 15
Rep:
|
So, who want's to help me format a .csv? It'll be fun, promise!
well, maybe not. I have a 7mb .csv that is in a completely unusable format. Right now it looks like this:
Code:
AAXJ,07/21/2010,53.9900
AAXJ,07/22/2010,54.8200
AAXJ,07/23/2010,55.2900
AAXJ,07/26/2010,55.7200
AAXJ,07/27/2010,56.0000
AAXJ,07/28/2010,55.8300
ACWI,01/02/2009,32.2900
ACWI,01/05/2009,32.7100
ACWI,01/06/2009,33.0300
ACWI,01/07/2009,32.3200
ACWX,01/02/2009,30.5500
ACWX,01/05/2009,30.7800
ACWX,01/06/2009,31.1500
ACWX,01/07/2009,30.5700
ACWX,01/08/2009,30.4200
Not exactly useful. I need to know how to sort it by the leading value. So:
Code:
01/01/09, AAAA, AAAB, AAAC
01/03/09, 33.23,22.0, 18.75
I know it's probably not feasible to match up the dates, that's fine. Having both a date and a value column for each symbol would be almost as perfect.
and so forth
|
|
|
07-31-2010, 11:33 AM
|
#2
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,028
|
I think you are going to have to explain further as:
Quote:
Not exactly useful. I need to know how to sort it by the leading value.
|
Would appear to have nothing at all to do with the output:
Quote:
01/01/09, AAAA, AAAB, AAAC
01/03/09, 33.23,22.0, 18.75
|
|
|
|
07-31-2010, 11:40 AM
|
#3
|
LQ Newbie
Registered: Mar 2005
Posts: 15
Original Poster
Rep:
|
Quote:
Originally Posted by grail
I think you are going to have to explain further as:
Would appear to have nothing at all to do with the output:
|
you're right, that wasn't at all clear. I'm fairly tired right now.
Date, AAXJ, ACWI, ACWX, should be what the top line says.
|
|
|
07-31-2010, 01:44 PM
|
#4
|
LQ Veteran
Registered: Sep 2003
Posts: 10,532
|
Hi,
Your desired end-result is still a bit vague:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/03/09, 33.23,22.0, 18.75
|
I assume you mean:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/01/09, 33.23,22.0, 18.75
|
As in 2 lines with the same date. First line <date> data fields and second line corresponding date with number fields.
If that is the case and your example input from post #1 is relevant:
Code:
#!/bin/bash
inFile="infile"
sort -t, -k2,2 $inFile | \
awk -F, 'BEGIN {
prevDate = "X"
dataPart = ""
numbPart = ""
}
{
if ( prevDate != $2 ) {
if ( prevDate != "X" ) {
printf("%s,%s\n",prevDate,dataPart)
printf("%s,%s\n",prevDate,numbPart)
}
prevDate = $2
dataPart = ""
numbPart = ""
}
dataPart = dataPart $1","
numbPart = numbPart $3","
}'
Sample run:
Code:
$ ./csv.reformat.sh
01/02/2009,ACWI,ACWX,ACWY,
01/02/2009,32.2900,30.5500,35.2900,
01/05/2009,ACWI,ACWX,
01/05/2009,32.7100,30.7800,
01/06/2009,ACWI,ACWX,ACWY,
01/06/2009,33.0300,31.1500,21.1500,
01/07/2009,ACWI,ACWX,
01/07/2009,32.3200,30.5700,
01/08/2009,ACWX,
01/08/2009,30.4200,
07/21/2010,AAXJ,AAXY,
07/21/2010,53.9900,43.9900,
07/22/2010,AAXJ,
07/22/2010,54.8200,
07/23/2010,AAXJ,
07/23/2010,55.2900,
07/26/2010,AAXJ,
07/26/2010,55.7200,
07/27/2010,AAXJ,
07/27/2010,56.0000,
2 things about the output generated:
1) I removed the spaces after the comma. They are probably not needed (the output being a csv file too)
2) There is a trailing , after each line. Probably not a problem, but you easily remove them by changing the last line ( }' ) with this: }' | sed 's/,$//'
Hope this helps.
BTW: I enlarged the input file for better testing, that's why the results won't match the data given in post #1.
Last edited by druuna; 07-31-2010 at 01:48 PM.
Reason: Comment about input file
|
|
1 members found this post helpful.
|
08-09-2010, 10:14 AM
|
#5
|
LQ Newbie
Registered: Mar 2005
Posts: 15
Original Poster
Rep:
|
Thanks a bunch man, but this is not QUITE what I need. This is so close, and I think I should be able to figure this out on my own, but I'm just not able to. Here is what the final output has to look like.
Code:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/09,,32.7100,30.7800
etc,etc,etc,etc
Sorry for not all the confusion, and I hope this is clear. And thanks again for all the help.
|
|
|
08-09-2010, 11:32 AM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,028
|
You could try something like:
Code:
#!/usr/bin/awk -f
BEGIN{FS=","}
/AAXJ/{aaxj[$2]=$3}
/ACWI/{acwi[$2]=$3}
/ACWX/{acwx[$2]=$3}
{d[$2]++}
END{
printf "%7s %9s %7s %7s\n", "Date","AAXJ","ACWI","ACWX"
n = asorti(d,f)
for(;++x <= n;)
printf "%10s %07.4f %07.4f %07.4f\n",f[x],aaxj[f[x]],acwi[f[x]],acwx[f[x]]
}
Call the script with your file as argument.
|
|
1 members found this post helpful.
|
08-09-2010, 11:57 AM
|
#7
|
LQ Veteran
Registered: Sep 2003
Posts: 10,532
|
Hi,
I'm still willing to help you out, but you do need to be precise.....
You say this is the desired output, which looks weird to be honest:
Quote:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/09,,32.7100,30.7800
|
Shouldn't that be something like this:
Quote:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/2009,,32.7100,30.7800
|
Also; was my previous assumption correct? This:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/03/09, 33.23,22.0, 18.75
|
vs
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/01/09, 33.23,22.0, 18.75
|
Assuming I'm right about the above mentioned points:
I do have a workable solution, if you are able to add an extra line to the 7 Mb csv file you have. This is needed otherwise the last line of input is not taken into account. I need a unique End-Of-File line.
I don't think it is too hard to add to your csv file: echo "ZZZZ,99/99/9999,99.9999" >> file.csv (Mind the >>, 2 of them, not 1!!!).
Code:
#!/bin/bash
inFile="csv.reformat.infile"
sort -t, -k2,2 $inFile | \
awk -F, 'BEGIN {
prevDate = "X"
dataArray[0] = ""
dataArray[1] = ""
dataArray[2] = ""
dataArray[3] = ""
print "Date,AAXJ,ACWI,ACWX"
}
{
if ( prevDate != $2 ) {
if ( prevDate != "X" ) {
printf("%s,%s,%s,%s\n",dataArray[0],dataArray[1],dataArray[2],dataArray[3])
}
prevDate = $2
dataArray[0] = ""
dataArray[1] = ""
dataArray[2] = ""
dataArray[3] = ""
}
dataArray[0] = $2
if ( $1 == "AAXJ") dataArray[1] = $3
if ( $1 == "ACWI") dataArray[2] = $3
if ( $1 == "ACWX") dataArray[3] = $3
}'
exit 0
Example run:
Code:
$ cat csv.reformat.infile
AAXJ,07/21/2010,53.9900
AAXJ,07/22/2010,54.8200
AAXJ,07/23/2010,55.2900
AAXJ,07/26/2010,55.7200
AAXJ,07/27/2010,56.0000
AAXJ,07/28/2010,55.8300
ACWI,01/02/2009,32.2900
ACWI,01/05/2009,32.7100
ACWI,01/06/2009,33.0300
ACWI,01/07/2009,32.3200
ACWX,01/02/2009,30.5500
ACWX,01/05/2009,30.7800
ACWX,01/06/2009,31.1500
ACWX,01/07/2009,30.5700
ACWX,01/08/2009,30.4200
AAXJ,12/02/2010,10.1000
ACWI,12/03/2010,20.2000
ACWX,12/04/2010,30.3000
ZZZZ,99/99/9999,99.9999
$ ./csv.reformat.sh
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/2009,,32.7100,30.7800
01/06/2009,,33.0300,31.1500
01/07/2009,,32.3200,30.5700
01/08/2009,,,30.4200
07/21/2010,53.9900,,
07/22/2010,54.8200,,
07/23/2010,55.2900,,
07/26/2010,55.7200,,
07/27/2010,56.0000,,
07/28/2010,55.8300,,
12/02/2010,10.1000,,
12/03/2010,,20.2000,
12/04/2010,,,30.3000
The blue part I added to your original dataset for testing. The bold part is the extra line I was talking about earlier.
Hope this helps.
|
|
1 members found this post helpful.
|
08-09-2010, 12:01 PM
|
#8
|
LQ Veteran
Registered: Sep 2003
Posts: 10,532
|
@grail: Just noticed your post. Its a beauty!
|
|
|
08-09-2010, 06:08 PM
|
#9
|
LQ Newbie
Registered: Mar 2005
Posts: 15
Original Poster
Rep:
|
Thanks ya'll these are absolutely brilliant. I think I have this under control now.
|
|
|
All times are GMT -5. The time now is 06:51 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|