Need a shell script to turn a lined file into a comma seperated file

Thaidog · 03-08-2008, 02:09 PM

I've got a file with about 400 entries that I need turned into a csv file.

Right now each line has one entry... how do I output that to a comma separated list?

jschiwal · 03-08-2008, 03:40 PM

A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.

I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.

Thaidog · 03-08-2008, 05:39 PM

Quote:

Originally Posted by jschiwal

A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.

I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.

This is strictly csv in it's simplest form - for instance:

The first few lines of the file:

11
33
34

Those lines need to be converted to:

11,33,34

Nothing fancy just a new format is needed. (and that format *has* to be like the format shown above)

jschiwal · 03-08-2008, 06:05 PM

You need to show more. How are the records separated in the input? Show two sample records.

If they are like
11
33
34

12
20
14

Then you could use:

Code:

awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv

If the input is like,
11
33
34
12
20
14

And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv

If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv

Thaidog · 03-08-2008, 06:28 PM

Quote:

Originally Posted by jschiwal

You need to show more. How are the records separated in the input? Show two sample records.

If they are like
11
33
34

12
20
14

Then you could use:

Code:

awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv

If the input is like,
11
33
34
12
20
14

And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv

If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv

Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:

11
33
34

If that was the entire file I would have to organize it like this:

11,33,34

Obviously, if I had only 3 records I would just stick a few commas in the middle and be done with it... but I expect a whole lot more of these files

jschiwal · 03-08-2008, 07:08 PM

I went through 3 possible solutions depending on the input format of the file. If you have a file with 400 lines of numbers and want all of the numbers on a single line separated by commas, then the third method I gave using the "tr" command would work. Otherwise you need to provide information on how individual records are separated. The first solution is for records where an empty line delineates records and would work for records with differing numbers of fields. The second one is when the input doesn't have record delineation but you want to group the input into 3 field long records. From your last post, it sounds like you want to convert a file with line separated records, each one field long, into a single record with comma separated fields. The solution using "tr" is the one that will do that.

Normally csv files consist of a number of records, with one record per line. The fields of a record are separated with by commas. It is customary however to surround text with double or single quotes, as in
100, "John Smith", "100 Excelsior Drive", 56103
105, "John Doe","1234 West Benning", 51243

Also if the input is from windows, you need to convert it to a Unix/Linux text file as well. ( you mentioned notepad ). The "dos2unix" command will to that.
Or you could cheat with: tr -s <inputfile '\r\n' ',' >outputfile
That will convert \r\n to ,, and then squeeze the extra comma to a single one.

You may also need to remove blank lines from the input to prevent extra commas in any case.
tr -s <inputfile '\r\n' ',' | sed -i '^$d' >outputfile

Be sure to check the input and output with "head" and "tail". What happens if there is an extra line in the beginning of the input or an extra or missing one at the end of a file? Sometimes massaging the input a bit before proceeding with the main problem helps. Then the input will match an easier to use pattern.

At work I'll download the inventory list from a device. I'll first delete from the beginning of the file to a line with the word 'OK' and then delete from the first blank line to the end of the file. First I'm getting rid of the parts of the file that aren't a part of the inventory list, and then I extract what I need.

Your example contains lines of numbers. What happens if a line contains alpha characters? Would it ruin the results if that line were deleted from result? Is the data already ruined? Do you strip off any none numeric characters?

When working with regular expression patterns, it is necessary to be precise in defining the text patterns. This prevents false matches or misses.

Tinkster · 03-08-2008, 07:16 PM

Quote:

Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:

Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?

Code:

awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'

Cheers,
Tink

syg00 · 03-08-2008, 07:19 PM

Nice answer @jschiwal. Don't like tr for this though - what about that last newline ???. "Corner cases" are always a pain.
Can't see why the awk solution offered isn't acceptable personally.

Thaidog · 03-08-2008, 07:19 PM

Quote:

Originally Posted by Tinkster

Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?

Code:

awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'

Cheers,
Tink

Lol actually I do want them on the same line!

Thanks fellas the tr solution works perfectly!