LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-08-2008, 03:09 PM   #1
Thaidog
Member
 
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637

Rep: Reputation: 32
Question Need a shell script to turn a lined file into a comma seperated file


I've got a file with about 400 entries that I need turned into a csv file.

Right now each line has one entry... how do I output that to a comma separated list?
 
Old 03-08-2008, 04:40 PM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.

I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.

Last edited by jschiwal; 03-08-2008 at 05:38 PM.
 
Old 03-08-2008, 06:39 PM   #3
Thaidog
Member
 
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by jschiwal View Post
A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.

I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.
This is strictly csv in it's simplest form - for instance:

The first few lines of the file:

11
33
34

Those lines need to be converted to:

11,33,34

Nothing fancy just a new format is needed. (and that format *has* to be like the format shown above)

Last edited by Thaidog; 03-08-2008 at 06:41 PM.
 
Old 03-08-2008, 07:05 PM   #4
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
You need to show more. How are the records separated in the input? Show two sample records.

If they are like
11
33
34

12
20
14

Then you could use:
Code:
awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv
If the input is like,
11
33
34
12
20
14

And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv

If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv

Last edited by jschiwal; 03-08-2008 at 07:08 PM.
 
Old 03-08-2008, 07:28 PM   #5
Thaidog
Member
 
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by jschiwal View Post
You need to show more. How are the records separated in the input? Show two sample records.

If they are like
11
33
34

12
20
14

Then you could use:
Code:
awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv
If the input is like,
11
33
34
12
20
14

And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv

If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv
Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:

11
33
34

If that was the entire file I would have to organize it like this:

11,33,34

Obviously, if I had only 3 records I would just stick a few commas in the middle and be done with it... but I expect a whole lot more of these files
 
Old 03-08-2008, 08:08 PM   #6
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
I went through 3 possible solutions depending on the input format of the file. If you have a file with 400 lines of numbers and want all of the numbers on a single line separated by commas, then the third method I gave using the "tr" command would work. Otherwise you need to provide information on how individual records are separated. The first solution is for records where an empty line delineates records and would work for records with differing numbers of fields. The second one is when the input doesn't have record delineation but you want to group the input into 3 field long records. From your last post, it sounds like you want to convert a file with line separated records, each one field long, into a single record with comma separated fields. The solution using "tr" is the one that will do that.

Normally csv files consist of a number of records, with one record per line. The fields of a record are separated with by commas. It is customary however to surround text with double or single quotes, as in
100, "John Smith", "100 Excelsior Drive", 56103
105, "John Doe","1234 West Benning", 51243

Also if the input is from windows, you need to convert it to a Unix/Linux text file as well. ( you mentioned notepad ). The "dos2unix" command will to that.
Or you could cheat with: tr -s <inputfile '\r\n' ',' >outputfile
That will convert \r\n to ,, and then squeeze the extra comma to a single one.

You may also need to remove blank lines from the input to prevent extra commas in any case.
tr -s <inputfile '\r\n' ',' | sed -i '^$d' >outputfile

Be sure to check the input and output with "head" and "tail". What happens if there is an extra line in the beginning of the input or an extra or missing one at the end of a file? Sometimes massaging the input a bit before proceeding with the main problem helps. Then the input will match an easier to use pattern.

At work I'll download the inventory list from a device. I'll first delete from the beginning of the file to a line with the word 'OK' and then delete from the first blank line to the end of the file. First I'm getting rid of the parts of the file that aren't a part of the inventory list, and then I extract what I need.

Your example contains lines of numbers. What happens if a line contains alpha characters? Would it ruin the results if that line were deleted from result? Is the data already ruined? Do you strip off any none numeric characters?

When working with regular expression patterns, it is necessary to be precise in defining the text patterns. This prevents false matches or misses.

Last edited by jschiwal; 03-08-2008 at 08:14 PM.
 
Old 03-08-2008, 08:16 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:
Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?
Code:
awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'

Cheers,
Tink
 
Old 03-08-2008, 08:19 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,286

Rep: Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165Reputation: 4165
Nice answer @jschiwal. Don't like tr for this though - what about that last newline ???. "Corner cases" are always a pain.
Can't see why the awk solution offered isn't acceptable personally.

Last edited by syg00; 03-08-2008 at 08:21 PM.
 
Old 03-08-2008, 08:19 PM   #9
Thaidog
Member
 
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637

Original Poster
Rep: Reputation: 32
Quote:
Originally Posted by Tinkster View Post
Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?
Code:
awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'

Cheers,
Tink
Lol actually I do want them on the same line!

Thanks fellas the tr solution works perfectly!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need a script to remove last comma in a file jgombos Programming 15 01-14-2008 02:30 PM
Need a script to search and replace text in file using shell script unixlearner Programming 14 06-21-2007 11:37 PM
How to turn a file to a one liner thru sh script? jillann Linux - Newbie 15 04-18-2007 10:32 AM
Reading comma-separated data from file MeLassen Programming 4 04-04-2004 03:41 PM
comma delimited file cdragon Programming 5 06-21-2002 08:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration