LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Create 1 csv file from multiple txt files (https://www.linuxquestions.org/questions/programming-9/create-1-csv-file-from-multiple-txt-files-666865/)

richmur 09-01-2008 10:59 AM

Create 1 csv file from multiple txt files
 
Hi,

I am trying to create a single csv file, using multiple .txt files as input. The output format should be: Column1 - name of .txt file, Column2 - contents of txt file.

Thanks a lot!

raconteur 09-01-2008 02:40 PM

I'm assuming the contents of the text files is rather small.

If that is the case, then a simple approach would be:
Code:

for file in <list>; do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done

where <list> is a list of the text files.

That list could be the result of an 'ls' command, or the contents of yet another file containing a list of file names, or an explicitly entered list or pattern.

If your text files contain multiple lines, then you may need to translate them with into single lines if you want only two columns in your csv file. You can do that on the fly in the above loop if desired.

richmur 09-02-2008 02:31 AM

Thanks raconteur, yes you are correct, the contents of the text files are small, up to 4 or 5 lines max, but may contain leading spaces and content separated by blank lines. However, when I tried to run your code (firstly created a file called list - ls > list) the output I got seemed to be just the contents of the file called list,

list,xyz
abc
def
ghi

where abc, xyz etc are the names of the files and the contents of list

chrism01 09-02-2008 02:59 AM

You need an extra 'cat' in the first line eg if your input list of files is in_list.txt, then

Code:

for file in `cat in_list.txt`
do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done


jschiwal 09-02-2008 03:07 AM

You may want add quotes around the second field. This is usually done for text fields in csv files. CSV files don't normally have fields with newlines in them. Also, it begs the question how embedded quotes are handled. I tried an experiment, exporting a two record csv file from oocalc that contained quotes in the second column.
Code:

"file1","This is a test.  How will it export a csv file that contains “embedded quotes?”  I will also open up the file and insert newline characters if need be."
"file2","Line1 Line2 Line3 Line4"
"File 3","This is the last line. "

I did add newlines to the csv file and when I reloaded them into oocalc, they were stripped out.

A csv file with a multiline field would probably not be very portable. You might consider using an xml format instead.

If the purpose is to create these files from a script, this is often done in bash using HERE documents. A HERE document can even contain bash variables that are expanded before the file is written.

Code:

#!/usr/bin/env bash
MAX=300
cat >afile1.conf <<EOF
[general]
max = $MAX
min = 100
EOF

cat >anotherFile.conf <<EOF
This is the second file.
Second line of second file.
I need some sleep because this is the extent of
my imagination.
EOF


richmur 09-02-2008 03:45 AM

Quote:

Originally Posted by chrism01 (Post 3266971)
You need an extra 'cat' in the first line eg if your input list of files is in_list.txt, then

Code:

for file in `cat in_list.txt`
do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done


Thanks Chrism01, that works fine if there is just a single line of text in the txt file, however I think the blank lines in the text files are causing me some problems - the output csv file has the newline in the first column if there are more than 1 line.....my problem being that there are hundreds of text files, and I will have to recreate them again from the csv file (using awk -F, '{ print $2 > $1 }' myfile.csv) so I need the first column to only contain the names of the txt files.

chrism01 09-02-2008 06:28 PM

If you are going for a generalised soln (as alluded to by jschiwal) and yourself here, use a more powerful lang eg Perl.
If the input files can have newlines, blank lines, extra quotes (or not) etc, that's my recommendation.
Bash would get very messy.
I'm assuming you'd want to cvt newlines inside (input) files to spaces or something in the output file?

Tinkster 09-02-2008 09:17 PM

Or you could find a very unlikely record separator and (ab-)use awk ...
Code:

awk 'BEGIN{RS="123@^~456"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv



Cheers,
Tink

richmur 09-03-2008 03:10 AM

Quote:

Originally Posted by chrism01 (Post 3267700)
If you are going for a generalised soln (as alluded to by jschiwal) and yourself here, use a more powerful lang eg Perl.
If the input files can have newlines, blank lines, extra quotes (or not) etc, that's my recommendation.
Bash would get very messy.
I'm assuming you'd want to cvt newlines inside (input) files to spaces or something in the output file?

Hmm.... I'm thinking my best bet would be to just remove newlines from the input files (using perl -pi -e 's/\n//g' *.txt or something like that), then your

<for file in `cat in_list.txt`
do
contents=`cat $file`
echo "$file,$contents" >> csvfile
done>

will hopefully do the trick - cheers

richmur 09-03-2008 04:21 AM

Quote:

Originally Posted by Tinkster (Post 3267826)
Or you could find a very unlikely record separator and (ab-)use awk ...
Code:

awk 'BEGIN{RS="123@^~456"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv



Cheers,
Tink

Cheers Tink, it might not be the prettiest looking line of awk written ;-) but it will do the trick too - once I removed the newlines which were causing the text to spread across different cells.

Thanks a lot guys for the suggestions

-richmur!

Tinkster 09-03-2008 01:28 PM

Quote:

Originally Posted by richmur (Post 3268104)
Cheers Tink, it might not be the prettiest looking line of awk written ;-) but it will do the trick too - once I removed the newlines which were causing the text to spread across different cells.

Thanks a lot guys for the suggestions

-richmur!

My bad ... I forgot to put something in th begin section
that I was actually using in my test-runs ...


Code:

awk 'BEGIN{RS="\x04";FS="\n"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv
In fact, the whole NF loop with the printf's wouldn't
make any sense w/o the missing FS="\n". And you could
of course go and choose a less ugly RS, for me a "\x04"
worked just fine, but I don't know the data you're
dealing with .... ;}


Cheers,
Tink


All times are GMT -5. The time now is 08:47 AM.