Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have spent a lot of time extracting and formatting a very large (CSV) data file from a real-time device.
Here is a sample:
Code:
"zada68e","mv47b2","pd85sq5","fneab7","b3936k68"
"c076a4cb7","d6d1367","1b5812d6","ce8e82","ie5637"
"011b124","bc339504","color RED","dym989","8a5d78"
On 20
Off 24
Total 44
"69ad2","d363ad0","f41328","766ge1b","a29f"
"a8da68e","47db2","fd85","feab7","c936868"
"10e3","bda1b","143da56","0f19472","495bcexje"
"c55f950","nm787x","color GREEN","25329","10a9f2b"
On 15
Off 14
Total 29
Where:
Each data row has 5 fields.
Each field has double quotes around it.
Fields are separated by a comma.
Fields have variable lengths.
The number of data rows is unknown.
Every group of data rows is followed by 3 lines. On, Off and Total.
I'm not a shell script expert.
Can you help me with a basic shell script that will make the data look like this:
Code:
"zada68e","mv47b2","pd85sq5","fneab7","b3936k68"
"c076a4cb7","d6d1367","1b5812d6","ce8e82","ie5637"
"011b124","bc339504","color RED","dym989","8a5d78"
On 20
Off 24
Total 44
Date__________________color RED___________________
<new page>
"69ad2","d363ad0","f41328","766ge1b","a29f"
"a8da68e","47db2","fd85","feab7","c936868"
"10e3","bda1b","143da56","0f19472","495bcexje"
"c55f950","nm787x","color GREEN","25329","10a9f2b"
On 15
Off 14
Total 29
Date________________color GREEN___________________
<new page>
Where:
After the Total line, there is a blank line.
The line above the On is copied but fields 1 and 2 are replaced by Date________________
Field 3 is copied as-is.
Fields 4 and 5 are replaced by "_" underscores.
Finally, the new page control character(s) is inserted to ensure that nothing else prints on this page.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
Whenever I have to deal with stuff like this I do a "pre-edit" (with sed) then use awk to get what I need in a form I want.
I create a little sed file, something like "edit.sed" that replaces the commas with a vertical bar or a tab and gets rid of the double quotes. Something like this:
Code:
s/","/|/g < globally replace "," with a vertical bar>
s/^"//g < globally replace leading " with nothing>
s/"$//g < globally replace trailing " with nothing>
and maybe a few other messy things -- this is just to get you started. The Good Thing about sed is that it's a streaming editor so every one of those directives are acted on a line-by-line basis rather than scan the entire file for each directive; that makes it quick.
One messy thing you can get rid of or alter is a blank line in your input:
Code:
s/ *$//g < replace any trailing space with nothing >
/$^/d < delete blank lines >
Or substitute
Code:
s/ *$//g
s/^$/Date/g
Just a hint that may help.
You run that with
Code:
sed -f edit.sed filename.csv > newfilename.csv
Why the vertical bar? Well, they don't appear in any language and they're convenient as a field separator (you could also use a tab character). If you already have tab character in your input file (not as field separators but as actual text), add something like
Code:
s/<tab>/<space>/g
to the first line ("<tab>" is an actual tab character and "<space>" is a, you know, space).
Or, if there are tabs, you could replace them with vertical bars (same as above).
When you use AWK, you simply set a field separator in the BEGIN {} section:
Code:
BEGIN {
FS="|"
}
Or, if you decide to use tabs as field separators,
Code:
BEGIN {
FS="\t"
}
Then you refer to your five fields as $1, $2, ..., $5.
AWK provides you programming capability; i.e., you can do what you need to do by comparing and matching strings then act on them.
Hope this helps some.
Last edited by tronayne; 08-31-2012 at 09:45 AM.
Reason: Fumble finger
Can you post what you've tried/done on your own first, before asking others to write your scripts for you?? Bash scripting tutorials are abundant, and applying what you've been given in the past should also be a good starting point: http://tldp.org/LDP/abs/html/ http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.