.csv file upoload, manipulation, download script help
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
.csv file upoload, manipulation, download script help
Hi, looking for any helpful pointers or straight out code. I'm a basic hack at programming/scripting but can generally work with examples well to reach the final desired result. I've been googling for hours but have not found a solution that is close yet.
I'm trying to help a friend that has a repeating task of exporting data from her palm pilot (to a .csv file) and going through some manipulation to support printing out mailing labels of the data. The work can be done through a manual spreadsheet manipulation (concatenating some fields), but let's just say that computer skills are not her best strength and I want to get out of the 11pm phone call business.
What I'd like to do is host a script on my shared hosting linux box that would allow her to upload a .csv file, have it automatically concatenate the few fields required, and then provide a download of the resulting .csv file for importing into her label application.
"Last","First","Street","Town","ST","Zip","Name","city st zip"
"Bush","George","1600 Pennsylvania Avenue NW","Washington","DC","20500","George Bush","Washington, DC 20500"
Resources available on my host include perl 4.x and 5, php, bash, etc.
I'm sure some Perl fan can come up w/ a different solution, but it seems to me 1 line of sed could do the job:
Code:
sed -r 's+^"([^"]*)","([^"]*)","([^"]*)","([^"]*)","([^"]*)","([^"]*)"+&,"\2 \1","\4 \5 \6"+' old.csv >new.csv
or, if you're daring:
Code:
sed -ri 's+^"([^"]*)","([^"]*)","([^"]*)","([^"]*)","([^"]*)","([^"]*)"+&,"\2 \1","\4 \5 \6"+' old.csv
Notes
You will need to clean up the header line -- it will come out:
"Last","First","Street","Town","ST","Zip","First Last","Town ST Zip"
I have opted to use '+' as my separator instead of the "standard" '/'.
RTM sed.
I don't think awk will work here because your field separator effectively is '","', not a single character.
I am counting on there being quotes around every field.
I think this is safe because it looks like the exporting program is playing safe & adding them to every field.
Last edited by archtoad6; 11-08-2007 at 07:34 AM.
Reason: reformat
Thanks! The difference in the header line I will just fix on the label template side (tell it to look for these new field names instead of the prior ones).
Will try this in a little bit and report back on success.
Resources available on my host include perl 4.x and 5, php, bash, etc.
I'm gonna take the "etc" to mean that python is also available, then I'm gonna use that as an excuse to write python code .
Code:
import sys
import csv
def mapper(data):
"Process and modify lines"
#add fields to first line
for line in data:
line += ['name', 'city st zip']
yield line
break
#process rest of lines
for line in data:
name = ' '.join(line[1::-1])
address = '%s, %s %s' % tuple(line[3:6])
line += [name, address]
yield line
#check for correct number of arguments and parse them
num_args = len(sys.argv)
if not num_args in (2, 3):
sys.exit('Usage: %s cvsfile [outfile]' %sys.argv[0])
outfile = sys.stdout
if num_args == 3:
outfile = open(sys.argv[2], 'w')
#setup the reader/writer handlers
reader = csv.reader(open(sys.argv[1]))
writer = csv.writer(outfile, quoting=csv.QUOTE_ALL)
#process and write the data
writer.writerows(mapper(reader))
Usage:
Code:
parse.py csvfile [outfile]
The outfile is optional, if it's not there it will print to stdout.
This script is probably overkill for what you need, but it allows ease of modification if you ever need to add more fields. It also uses the csv module in Python, so it's less error prone and will handle csv that are formatted differently (not all csv's use quotes and such).
EDIT: Do NOT use the same file for input and output, It'll delete your data (since it's reading it line by line)
Last edited by angrybanana; 11-08-2007 at 10:54 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.