Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
03-08-2008, 03:09 PM
|
#1
|
Member
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637
Rep:
|
Need a shell script to turn a lined file into a comma seperated file
I've got a file with about 400 entries that I need turned into a csv file.
Right now each line has one entry... how do I output that to a comma separated list?
|
|
|
03-08-2008, 04:40 PM
|
#2
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.
I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.
Last edited by jschiwal; 03-08-2008 at 05:38 PM.
|
|
|
03-08-2008, 06:39 PM
|
#3
|
Member
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637
Original Poster
Rep:
|
Quote:
Originally Posted by jschiwal
A short sample of a few lines of input and what you want for output would be useful. You are describing a file with 400 records and only one field in a record. Unless the input has a record on multiple lines, there is no use for a comma as a record separator.
I prefer using the TAB charactor as the field separator. This allows other whitespace in the field values. If you want some fields to have double quotes around them you should indicate that as well. In other words, do you literally want comma separated values, or do you want it to be the same as a csv file produced or read by a certain program. I a field is of a text type, and that text could contain a comma, double quotes are used to surround that fields value in the csv file.
|
This is strictly csv in it's simplest form - for instance:
The first few lines of the file:
11
33
34
Those lines need to be converted to:
11,33,34
Nothing fancy just a new format is needed. (and that format *has* to be like the format shown above)
Last edited by Thaidog; 03-08-2008 at 06:41 PM.
|
|
|
03-08-2008, 07:05 PM
|
#4
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
You need to show more. How are the records separated in the input? Show two sample records.
If they are like
11
33
34
12
20
14
Then you could use:
Code:
awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv
If the input is like,
11
33
34
12
20
14
And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv
If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv
Last edited by jschiwal; 03-08-2008 at 07:08 PM.
|
|
|
03-08-2008, 07:28 PM
|
#5
|
Member
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637
Original Poster
Rep:
|
Quote:
Originally Posted by jschiwal
You need to show more. How are the records separated in the input? Show two sample records.
If they are like
11
33
34
12
20
14
Then you could use:
Code:
awk 'BEGIN { RS=""; FS="\n"; OFS=","} { $1=$1; print $0 }' inputfile >outputfile.csv
If the input is like,
11
33
34
12
20
14
And you want 3 fields per record, then a read loop might be better. (despite my signature!)
while read r1 r2 r3; do
printf "%d,%d,%d\n" $r1 $r2 $r3
done <inputfile >outputfile.csv
If there aren't blank lines between records, and you want to replace all newlines with commas, you could use the tr command:
tr '\n' ',' <inputfile >outputfile.csv
|
Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:
11
33
34
If that was the entire file I would have to organize it like this:
11,33,34
Obviously, if I had only 3 records I would just stick a few commas in the middle and be done with it... but I expect a whole lot more of these files
|
|
|
03-08-2008, 08:08 PM
|
#6
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
I went through 3 possible solutions depending on the input format of the file. If you have a file with 400 lines of numbers and want all of the numbers on a single line separated by commas, then the third method I gave using the "tr" command would work. Otherwise you need to provide information on how individual records are separated. The first solution is for records where an empty line delineates records and would work for records with differing numbers of fields. The second one is when the input doesn't have record delineation but you want to group the input into 3 field long records. From your last post, it sounds like you want to convert a file with line separated records, each one field long, into a single record with comma separated fields. The solution using "tr" is the one that will do that.
Normally csv files consist of a number of records, with one record per line. The fields of a record are separated with by commas. It is customary however to surround text with double or single quotes, as in
100, "John Smith", "100 Excelsior Drive", 56103
105, "John Doe","1234 West Benning", 51243
Also if the input is from windows, you need to convert it to a Unix/Linux text file as well. ( you mentioned notepad ). The "dos2unix" command will to that.
Or you could cheat with: tr -s <inputfile '\r\n' ',' >outputfile
That will convert \r\n to ,, and then squeeze the extra comma to a single one.
You may also need to remove blank lines from the input to prevent extra commas in any case.
tr -s <inputfile '\r\n' ',' | sed -i '^$d' >outputfile
Be sure to check the input and output with "head" and "tail". What happens if there is an extra line in the beginning of the input or an extra or missing one at the end of a file? Sometimes massaging the input a bit before proceeding with the main problem helps. Then the input will match an easier to use pattern.
At work I'll download the inventory list from a device. I'll first delete from the beginning of the file to a line with the word 'OK' and then delete from the first blank line to the end of the file. First I'm getting rid of the parts of the file that aren't a part of the inventory list, and then I extract what I need.
Your example contains lines of numbers. What happens if a line contains alpha characters? Would it ruin the results if that line were deleted from result? Is the data already ruined? Do you strip off any none numeric characters?
When working with regular expression patterns, it is necessary to be precise in defining the text patterns. This prevents false matches or misses.
Last edited by jschiwal; 03-08-2008 at 08:14 PM.
|
|
|
03-08-2008, 08:16 PM
|
#7
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Ok. It's really simple. Imagine instead of 400 records I had only three in a text tile written by notepad.exe in a file called errors.txt:
|
Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?
Code:
awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'
Cheers,
Tink
|
|
|
03-08-2008, 08:19 PM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,286
|
Nice answer @jschiwal. Don't like tr for this though - what about that last newline ???. "Corner cases" are always a pain.
Can't see why the awk solution offered isn't acceptable personally.
Last edited by syg00; 03-08-2008 at 08:21 PM.
|
|
|
03-08-2008, 08:19 PM
|
#9
|
Member
Registered: Sep 2002
Location: Hilton Head, SC
Distribution: Gentoo
Posts: 637
Original Poster
Rep:
|
Quote:
Originally Posted by Tinkster
Hence his question(s). Or do you really want all 400 lines/words
on one line separated by commas?
Code:
awk 'NR ==1{printf "%s", $1}NR>1{printf ",%s", $1}END{printf "\n"}'
Cheers,
Tink
|
Lol actually I do want them on the same line!
Thanks fellas the tr solution works perfectly!
|
|
|
All times are GMT -5. The time now is 03:50 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|