LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Need shell script help with a CSV file. (http://www.linuxquestions.org/questions/linux-software-2/need-shell-script-help-with-a-csv-file-4175424921/)

lothario 08-31-2012 04:44 AM

Need shell script help with a CSV file.
 
I have spent a lot of time extracting and formatting a very large (CSV) data file from a real-time device.
Here is a sample:
Code:

"zada68e","mv47b2","pd85sq5","fneab7","b3936k68"
"c076a4cb7","d6d1367","1b5812d6","ce8e82","ie5637"
"011b124","bc339504","color RED","dym989","8a5d78"
On      20
Off    24
Total  44
"69ad2","d363ad0","f41328","766ge1b","a29f"
"a8da68e","47db2","fd85","feab7","c936868"
"10e3","bda1b","143da56","0f19472","495bcexje"
"c55f950","nm787x","color GREEN","25329","10a9f2b"
On      15
Off    14
Total  29

Where:
  • Each data row has 5 fields.
  • Each field has double quotes around it.
  • Fields are separated by a comma.
  • Fields have variable lengths.
  • The number of data rows is unknown.
  • Every group of data rows is followed by 3 lines. On, Off and Total.
I'm not a shell script expert.
Can you help me with a basic shell script that will make the data look like this:
Code:

"zada68e","mv47b2","pd85sq5","fneab7","b3936k68"
"c076a4cb7","d6d1367","1b5812d6","ce8e82","ie5637"
"011b124","bc339504","color RED","dym989","8a5d78"
On      20
Off    24
Total  44

Date__________________color RED___________________
<new page>


"69ad2","d363ad0","f41328","766ge1b","a29f"
"a8da68e","47db2","fd85","feab7","c936868"
"10e3","bda1b","143da56","0f19472","495bcexje"
"c55f950","nm787x","color GREEN","25329","10a9f2b"
On      15
Off    14
Total  29

Date________________color GREEN___________________
<new page>

Where:
  1. After the Total line, there is a blank line.
  2. The line above the On is copied but fields 1 and 2 are replaced by Date________________
  3. Field 3 is copied as-is.
  4. Fields 4 and 5 are replaced by "_" underscores.
  5. Finally, the new page control character(s) is inserted to ensure that nothing else prints on this page.
Thank you.

tronayne 08-31-2012 10:38 AM

Whenever I have to deal with stuff like this I do a "pre-edit" (with sed) then use awk to get what I need in a form I want.

I create a little sed file, something like "edit.sed" that replaces the commas with a vertical bar or a tab and gets rid of the double quotes. Something like this:
Code:

s/","/|/g    < globally replace "," with a vertical bar>
s/^"//g      < globally replace leading " with nothing>
s/"$//g      < globally replace trailing " with nothing>

and maybe a few other messy things -- this is just to get you started. The Good Thing about sed is that it's a streaming editor so every one of those directives are acted on a line-by-line basis rather than scan the entire file for each directive; that makes it quick.

One messy thing you can get rid of or alter is a blank line in your input:
Code:

s/  *$//g    < replace any trailing space with nothing >
/$^/d        < delete blank lines >

Or substitute
Code:

s/  *$//g
s/^$/Date/g

Just a hint that may help.

You run that with
Code:

sed -f edit.sed filename.csv > newfilename.csv
Why the vertical bar? Well, they don't appear in any language and they're convenient as a field separator (you could also use a tab character). If you already have tab character in your input file (not as field separators but as actual text), add something like
Code:

s/<tab>/<space>/g
to the first line ("<tab>" is an actual tab character and "<space>" is a, you know, space).

Or, if there are tabs, you could replace them with vertical bars (same as above).

When you use AWK, you simply set a field separator in the BEGIN {} section:
Code:

BEGIN {
    FS="|"
}

Or, if you decide to use tabs as field separators,
Code:

BEGIN {
    FS="\t"
}

Then you refer to your five fields as $1, $2, ..., $5.

AWK provides you programming capability; i.e., you can do what you need to do by comparing and matching strings then act on them.

Hope this helps some.

TB0ne 08-31-2012 12:40 PM

Quote:

Originally Posted by lothario (Post 4768946)
I have spent a lot of time extracting and formatting a very large (CSV) data file from a real-time device.

I'm not a shell script expert. Can you help me with a basic shell script that will make the data look like this:

Where:
  1. After the Total line, there is a blank line.
  2. The line above the On is copied but fields 1 and 2 are replaced by Date________________
  3. Field 3 is copied as-is.
  4. Fields 4 and 5 are replaced by "_" underscores.
  5. Finally, the new page control character(s) is inserted to ensure that nothing else prints on this page.
Thank you.

Not meaning to sound nasty, but this seems to be a recurring theme:
https://www.linuxquestions.org/quest...-field-750045/
https://www.linuxquestions.org/quest...t-file-751241/
https://www.linuxquestions.org/quest...attern-658139/
https://www.linuxquestions.org/quest...-files-644604/

Can you post what you've tried/done on your own first, before asking others to write your scripts for you?? Bash scripting tutorials are abundant, and applying what you've been given in the past should also be a good starting point:
http://tldp.org/LDP/abs/html/
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html


All times are GMT -5. The time now is 09:20 PM.