LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-31-2010, 10:00 AM   #1
pinecone
LQ Newbie
 
Registered: Mar 2005
Posts: 15

Rep: Reputation: 0
So, who want's to help me format a .csv? It'll be fun, promise!


well, maybe not. I have a 7mb .csv that is in a completely unusable format. Right now it looks like this:

Code:
AAXJ,07/21/2010,53.9900
AAXJ,07/22/2010,54.8200
AAXJ,07/23/2010,55.2900
AAXJ,07/26/2010,55.7200
AAXJ,07/27/2010,56.0000
AAXJ,07/28/2010,55.8300
ACWI,01/02/2009,32.2900
ACWI,01/05/2009,32.7100
ACWI,01/06/2009,33.0300
ACWI,01/07/2009,32.3200
ACWX,01/02/2009,30.5500
ACWX,01/05/2009,30.7800
ACWX,01/06/2009,31.1500
ACWX,01/07/2009,30.5700
ACWX,01/08/2009,30.4200
Not exactly useful. I need to know how to sort it by the leading value. So:
Code:
01/01/09, AAAA, AAAB, AAAC
01/03/09, 33.23,22.0, 18.75
I know it's probably not feasible to match up the dates, that's fine. Having both a date and a value column for each symbol would be almost as perfect.

and so forth
 
Old 07-31-2010, 10:33 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,542

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
I think you are going to have to explain further as:
Quote:
Not exactly useful. I need to know how to sort it by the leading value.
Would appear to have nothing at all to do with the output:
Quote:
01/01/09, AAAA, AAAB, AAAC
01/03/09, 33.23,22.0, 18.75
 
Old 07-31-2010, 10:40 AM   #3
pinecone
LQ Newbie
 
Registered: Mar 2005
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
I think you are going to have to explain further as:

Would appear to have nothing at all to do with the output:
you're right, that wasn't at all clear. I'm fairly tired right now.

Date, AAXJ, ACWI, ACWX, should be what the top line says.
 
Old 07-31-2010, 12:44 PM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Your desired end-result is still a bit vague:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/03/09, 33.23,22.0, 18.75
I assume you mean:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/01/09, 33.23,22.0, 18.75
As in 2 lines with the same date. First line <date> data fields and second line corresponding date with number fields.

If that is the case and your example input from post #1 is relevant:
Code:
#!/bin/bash

inFile="infile"

sort -t, -k2,2 $inFile | \
awk -F, 'BEGIN {
  prevDate = "X"
  dataPart = ""
  numbPart = ""
}
{
  if ( prevDate != $2 ) {
    if ( prevDate != "X" ) {
      printf("%s,%s\n",prevDate,dataPart)
      printf("%s,%s\n",prevDate,numbPart)
    }
    prevDate = $2
    dataPart = ""
    numbPart = ""
  }
  dataPart = dataPart $1","
  numbPart = numbPart $3","
}'
Sample run:
Code:
$ ./csv.reformat.sh 
01/02/2009,ACWI,ACWX,ACWY,
01/02/2009,32.2900,30.5500,35.2900,
01/05/2009,ACWI,ACWX,
01/05/2009,32.7100,30.7800,
01/06/2009,ACWI,ACWX,ACWY,
01/06/2009,33.0300,31.1500,21.1500,
01/07/2009,ACWI,ACWX,
01/07/2009,32.3200,30.5700,
01/08/2009,ACWX,
01/08/2009,30.4200,
07/21/2010,AAXJ,AAXY,
07/21/2010,53.9900,43.9900,
07/22/2010,AAXJ,
07/22/2010,54.8200,
07/23/2010,AAXJ,
07/23/2010,55.2900,
07/26/2010,AAXJ,
07/26/2010,55.7200,
07/27/2010,AAXJ,
07/27/2010,56.0000,
2 things about the output generated:

1) I removed the spaces after the comma. They are probably not needed (the output being a csv file too)
2) There is a trailing , after each line. Probably not a problem, but you easily remove them by changing the last line ( }' ) with this: }' | sed 's/,$//'

Hope this helps.

BTW: I enlarged the input file for better testing, that's why the results won't match the data given in post #1.

Last edited by druuna; 07-31-2010 at 12:48 PM. Reason: Comment about input file
 
1 members found this post helpful.
Old 08-09-2010, 09:14 AM   #5
pinecone
LQ Newbie
 
Registered: Mar 2005
Posts: 15

Original Poster
Rep: Reputation: 0
Thanks a bunch man, but this is not QUITE what I need. This is so close, and I think I should be able to figure this out on my own, but I'm just not able to. Here is what the final output has to look like.
Code:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/09,,32.7100,30.7800
etc,etc,etc,etc
Sorry for not all the confusion, and I hope this is clear. And thanks again for all the help.
 
Old 08-09-2010, 10:32 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,542

Rep: Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919Reputation: 1919
You could try something like:
Code:
#!/usr/bin/awk -f

BEGIN{FS=","}

/AAXJ/{aaxj[$2]=$3}
/ACWI/{acwi[$2]=$3}
/ACWX/{acwx[$2]=$3}

{d[$2]++}

END{
	printf "%7s %9s %7s %7s\n", "Date","AAXJ","ACWI","ACWX"

	n = asorti(d,f)
	for(;++x <= n;)
		printf "%10s %07.4f %07.4f %07.4f\n",f[x],aaxj[f[x]],acwi[f[x]],acwx[f[x]]
}
Call the script with your file as argument.
 
1 members found this post helpful.
Old 08-09-2010, 10:57 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

I'm still willing to help you out, but you do need to be precise.....

You say this is the desired output, which looks weird to be honest:
Quote:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/09,,32.7100,30.7800
Shouldn't that be something like this:
Quote:
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/2009,,32.7100,30.7800
Also; was my previous assumption correct? This:
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/03/09, 33.23,22.0, 18.75
vs
Quote:
01/01/09, AAXJ, ACWI, ACWX
01/01/09, 33.23,22.0, 18.75
Assuming I'm right about the above mentioned points:

I do have a workable solution, if you are able to add an extra line to the 7 Mb csv file you have. This is needed otherwise the last line of input is not taken into account. I need a unique End-Of-File line.

I don't think it is too hard to add to your csv file: echo "ZZZZ,99/99/9999,99.9999" >> file.csv (Mind the >>, 2 of them, not 1!!!).
Code:
#!/bin/bash

inFile="csv.reformat.infile"

sort -t, -k2,2 $inFile | \
awk -F, 'BEGIN {
  prevDate = "X"
  dataArray[0] = ""
  dataArray[1] = ""
  dataArray[2] = ""
  dataArray[3] = ""

  print "Date,AAXJ,ACWI,ACWX"
}
{
  if ( prevDate != $2 ) {
    if ( prevDate != "X" ) {
       printf("%s,%s,%s,%s\n",dataArray[0],dataArray[1],dataArray[2],dataArray[3])
    }
    prevDate = $2
    dataArray[0] = ""
    dataArray[1] = ""
    dataArray[2] = ""
    dataArray[3] = ""
  }

  dataArray[0] = $2
  if ( $1 == "AAXJ") dataArray[1] = $3
  if ( $1 == "ACWI") dataArray[2] = $3
  if ( $1 == "ACWX") dataArray[3] = $3

}' 

exit 0
Example run:
Code:
 $ cat csv.reformat.infile
AAXJ,07/21/2010,53.9900
AAXJ,07/22/2010,54.8200
AAXJ,07/23/2010,55.2900
AAXJ,07/26/2010,55.7200
AAXJ,07/27/2010,56.0000
AAXJ,07/28/2010,55.8300
ACWI,01/02/2009,32.2900
ACWI,01/05/2009,32.7100
ACWI,01/06/2009,33.0300
ACWI,01/07/2009,32.3200
ACWX,01/02/2009,30.5500
ACWX,01/05/2009,30.7800
ACWX,01/06/2009,31.1500
ACWX,01/07/2009,30.5700
ACWX,01/08/2009,30.4200
AAXJ,12/02/2010,10.1000
ACWI,12/03/2010,20.2000
ACWX,12/04/2010,30.3000
ZZZZ,99/99/9999,99.9999

$ ./csv.reformat.sh
Date,AAXJ,ACWI,ACWX
01/02/2009,,32.2900,30.5500
01/05/2009,,32.7100,30.7800
01/06/2009,,33.0300,31.1500
01/07/2009,,32.3200,30.5700
01/08/2009,,,30.4200
07/21/2010,53.9900,,
07/22/2010,54.8200,,
07/23/2010,55.2900,,
07/26/2010,55.7200,,
07/27/2010,56.0000,,
07/28/2010,55.8300,,
12/02/2010,10.1000,,
12/03/2010,,20.2000,
12/04/2010,,,30.3000
The blue part I added to your original dataset for testing. The bold part is the extra line I was talking about earlier.

Hope this helps.
 
1 members found this post helpful.
Old 08-09-2010, 11:01 AM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
@grail: Just noticed your post. Its a beauty!
 
Old 08-09-2010, 05:08 PM   #9
pinecone
LQ Newbie
 
Registered: Mar 2005
Posts: 15

Original Poster
Rep: Reputation: 0
Thanks ya'll these are absolutely brilliant. I think I have this under control now.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] howto: ls output to csv format limdel Linux - Newbie 12 09-19-2014 01:00 PM
How to import redhat users to ClearOS using CSV file format. alfaedh Linux - Server 1 01-31-2010 03:48 PM
Redirect output in .xls or csv format jeesun Programming 5 12-25-2009 05:56 PM
OpenOffice help , read csv-like text format ufmale Linux - Desktop 3 09-10-2008 01:26 PM
Promise SATAII TX4 and WD4000 HD - cannnot format Caoster007 Linux - Hardware 0 03-30-2006 04:26 AM


All times are GMT -5. The time now is 11:46 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration