LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 09-01-2008, 11:59 AM   #1
richmur
LQ Newbie
 
Registered: Aug 2005
Distribution: Suse 10.0
Posts: 7

Rep: Reputation: 0
Create 1 csv file from multiple txt files


Hi,

I am trying to create a single csv file, using multiple .txt files as input. The output format should be: Column1 - name of .txt file, Column2 - contents of txt file.

Thanks a lot!
 
Old 09-01-2008, 03:40 PM   #2
raconteur
Member
 
Registered: Dec 2007
Location: Slightly left of center
Distribution: slackware
Posts: 276
Blog Entries: 2

Rep: Reputation: 44
I'm assuming the contents of the text files is rather small.

If that is the case, then a simple approach would be:
Code:
for file in <list>; do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done
where <list> is a list of the text files.

That list could be the result of an 'ls' command, or the contents of yet another file containing a list of file names, or an explicitly entered list or pattern.

If your text files contain multiple lines, then you may need to translate them with into single lines if you want only two columns in your csv file. You can do that on the fly in the above loop if desired.
 
Old 09-02-2008, 03:31 AM   #3
richmur
LQ Newbie
 
Registered: Aug 2005
Distribution: Suse 10.0
Posts: 7

Original Poster
Rep: Reputation: 0
Thanks raconteur, yes you are correct, the contents of the text files are small, up to 4 or 5 lines max, but may contain leading spaces and content separated by blank lines. However, when I tried to run your code (firstly created a file called list - ls > list) the output I got seemed to be just the contents of the file called list,

list,xyz
abc
def
ghi

where abc, xyz etc are the names of the files and the contents of list

Last edited by richmur; 09-02-2008 at 04:03 AM.
 
Old 09-02-2008, 03:59 AM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
You need an extra 'cat' in the first line eg if your input list of files is in_list.txt, then

Code:
for file in `cat in_list.txt`
do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done
 
Old 09-02-2008, 04:07 AM   #5
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655Reputation: 655
You may want add quotes around the second field. This is usually done for text fields in csv files. CSV files don't normally have fields with newlines in them. Also, it begs the question how embedded quotes are handled. I tried an experiment, exporting a two record csv file from oocalc that contained quotes in the second column.
Code:
"file1","This is a test.  How will it export a csv file that contains “embedded quotes?”  I will also open up the file and insert newline characters if need be."
"file2","Line1 Line2 Line3 Line4"
"File 3","This is the last line. "
I did add newlines to the csv file and when I reloaded them into oocalc, they were stripped out.

A csv file with a multiline field would probably not be very portable. You might consider using an xml format instead.

If the purpose is to create these files from a script, this is often done in bash using HERE documents. A HERE document can even contain bash variables that are expanded before the file is written.

Code:
#!/usr/bin/env bash
MAX=300
cat >afile1.conf <<EOF
[general]
max = $MAX
min = 100
EOF

cat >anotherFile.conf <<EOF
This is the second file.
Second line of second file.
I need some sleep because this is the extent of 
my imagination.
EOF

Last edited by jschiwal; 09-02-2008 at 04:20 AM.
 
Old 09-02-2008, 04:45 AM   #6
richmur
LQ Newbie
 
Registered: Aug 2005
Distribution: Suse 10.0
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by chrism01 View Post
You need an extra 'cat' in the first line eg if your input list of files is in_list.txt, then

Code:
for file in `cat in_list.txt`
do
  contents=`cat $file`
  echo "$file,$contents" >> csvfile
done
Thanks Chrism01, that works fine if there is just a single line of text in the txt file, however I think the blank lines in the text files are causing me some problems - the output csv file has the newline in the first column if there are more than 1 line.....my problem being that there are hundreds of text files, and I will have to recreate them again from the csv file (using awk -F, '{ print $2 > $1 }' myfile.csv) so I need the first column to only contain the names of the txt files.
 
Old 09-02-2008, 07:28 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
If you are going for a generalised soln (as alluded to by jschiwal) and yourself here, use a more powerful lang eg Perl.
If the input files can have newlines, blank lines, extra quotes (or not) etc, that's my recommendation.
Bash would get very messy.
I'm assuming you'd want to cvt newlines inside (input) files to spaces or something in the output file?
 
Old 09-02-2008, 10:17 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Or you could find a very unlikely record separator and (ab-)use awk ...
Code:
awk 'BEGIN{RS="123@^~456"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv



Cheers,
Tink
 
Old 09-03-2008, 04:10 AM   #9
richmur
LQ Newbie
 
Registered: Aug 2005
Distribution: Suse 10.0
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by chrism01 View Post
If you are going for a generalised soln (as alluded to by jschiwal) and yourself here, use a more powerful lang eg Perl.
If the input files can have newlines, blank lines, extra quotes (or not) etc, that's my recommendation.
Bash would get very messy.
I'm assuming you'd want to cvt newlines inside (input) files to spaces or something in the output file?
Hmm.... I'm thinking my best bet would be to just remove newlines from the input files (using perl -pi -e 's/\n//g' *.txt or something like that), then your

<for file in `cat in_list.txt`
do
contents=`cat $file`
echo "$file,$contents" >> csvfile
done>

will hopefully do the trick - cheers
 
Old 09-03-2008, 05:21 AM   #10
richmur
LQ Newbie
 
Registered: Aug 2005
Distribution: Suse 10.0
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Tinkster View Post
Or you could find a very unlikely record separator and (ab-)use awk ...
Code:
awk 'BEGIN{RS="123@^~456"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv



Cheers,
Tink
Cheers Tink, it might not be the prettiest looking line of awk written ;-) but it will do the trick too - once I removed the newlines which were causing the text to spread across different cells.

Thanks a lot guys for the suggestions

-richmur!
 
Old 09-03-2008, 02:28 PM   #11
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,005
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by richmur View Post
Cheers Tink, it might not be the prettiest looking line of awk written ;-) but it will do the trick too - once I removed the newlines which were causing the text to spread across different cells.

Thanks a lot guys for the suggestions

-richmur!
My bad ... I forgot to put something in th begin section
that I was actually using in my test-runs ...


Code:
awk 'BEGIN{RS="\x04";FS="\n"}{ printf "%s,", FILENAME; for (i=1; i<=NF; i++){printf "%s ", $i}; printf "\n" }' *txt > list.csv
In fact, the whole NF loop with the printf's wouldn't
make any sense w/o the missing FS="\n". And you could
of course go and choose a less ugly RS, for me a "\x04"
worked just fine, but I don't know the data you're
dealing with .... ;}


Cheers,
Tink
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Need help create a bash script to edit CSV File imkornhulio Programming 13 02-05-2009 11:23 AM
Combine output of multiple files in one CSV file say_hi_ravi Programming 4 07-17-2008 04:04 AM
Comparing two csv files and write different record in third CSV file irfanb146 Linux - Newbie 3 06-30-2008 10:15 PM
create chart from csv file wolfipa Linux - Software 6 01-31-2008 08:59 PM
How can I send mails to multiple addresses in a txt file will824 Linux - Software 8 09-14-2006 11:26 PM


All times are GMT -5. The time now is 03:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration