LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-01-2006, 07:32 AM   #1
me_the_apprentice
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Rep: Reputation: 0
Parsing XLS or CSV in Perl - what and what not


Hi Everyone!!
You might be aware of "use SpreadSheet::ParseExcel" thing in Perl. I am not pro on that/Perl but want to understand this.. what we shall prefer -- a CSV file or a plain XLS file for reading the parsing the data. What are the pros and cons of using CSV's over XLS; where we shall prefer them over XLS and where not.

Thanks for your comments/thoughts in advance...
 
Old 02-01-2006, 02:37 PM   #2
Michael_S
Member
 
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 78

Rep: Reputation: 23
Can you be more specific as to what you're trying to do?

I haven't used any software that parses XLS files (except for Excel itself, naturally).

Comma-Separated Value files are easy to use and manipulate, but obviously they lack some things that are built into the XLS file format. The company I work for has databases for standalone applications at sites, and we extract data from each table as .csv files, zip the files, encrypt the zip archive, FTP it to the main office, and run an application to unencrypt, extract, and integrate the data into the main database.

Our application uses Java just because all of the other software we use is Java. But each step in the process would be easy to do with Perl.
 
Old 02-01-2006, 11:23 PM   #3
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,311

Rep: Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040
Concur with Michael_S; are these spreadsheets basically data or do you want to be able to handle spreadsheets with lots of formulae/macros etc?
Most tools eg DBs have an 'import as CSV' option for data, so I'd go with that if possible.
The other issue is that different versions of Excel may need to be parsed differently for complex spreradsheets, as MS changes things (ie similar to the .doc 'std' changes gradually)
 
Old 02-02-2006, 11:10 PM   #4
me_the_apprentice
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Ok!
So here's the deal; I am a Build and Release Admin, the developer put-into it varied information like CVS version of code file, location, domain it belongs to and DB schema the code needs to get released to etc. I dont need Macro's and all (I didnt faced any issues as of now with different version of MSExcel). But recently, an issue came up -- a developer typed in CVS version as 1.10 and it got translated to "1.1".

So wanted to understand whether these kind of "automatic" manipulations occur in CSV's as well. Quite obviously, I am thinking of three approaches:
#1. I shall make my code better to take care of such small issues; (though I am not very comfortable as an error can come my way then some other wil rise up on later date)
#2. Find out if there are some filters that can be put on format of information that gets fed into these spreadsheets; but again, how to ensure that it gets read as is and no translation happens on top of it :-)
#3. I switch to CVS from XLS -- but not sure about what will be the impact on my code -- will I still be able to use "use SpreadExcel::ParseExcel". Moreover, what will be the impact on overall performance of my code. If it doesnt matter much; then what are the things to watch out for when working with CSV's. If you have some URL or some doc on this; then plz plz share it.

Again, much thanks for your thoughts!!
- The Apprentice
 
Old 02-03-2006, 11:49 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,298

Rep: Reputation: 175Reputation: 175
Quote:
developer typed in CVS version as 1.10 and it got translated to "1.1".
surely a case of the wrong format for a column?

Quote:
whether these kind of "automatic" manipulations occur in CSV's as well
depends on the translator I'd say.
In my experience excel is absolutely *crap* with CSV.
Check the data carefully.
It's really brain dead with commas even inside quotes.
 
Old 02-03-2006, 04:09 PM   #6
Michael_S
Member
 
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 78

Rep: Reputation: 23
Do you mean CSV as in Comma Seperated Value, or CVS, as in Concurrent Versioning System? Build and release tracking sounds like CVS, not CSV.

If you are talking about CSV, then I'm not sure I can help. I think the way data is exported or imported is pretty dependent upon the program you use.
 
Old 02-04-2006, 01:07 AM   #7
me_the_apprentice
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Original Poster
Rep: Reputation: 0
Michael, you gave it a thought, that itself honoured my question. Sorry for the typo; I meant Comma Seperated Files only (also, we are using CVS tool just like any other development project for maintaining the document/code repositories).

Bigearsbilly, it would be really good for this community (and me) if you can list down things that one shall be careful about (from testing perspective) when migrating to CSV's from XLS.

Thanks!!
- The Apprentice
 
Old 02-07-2006, 11:36 PM   #8
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,311

Rep: Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040
If you'd like a DB style approach, try this: http://search.cpan.org/~jzucker/DBD-...dle/DBD/CSV.pm
or
http://search.cpan.org/~tmtm/Text-CS.../CSV/Simple.pm, & see the note about Text::CSV_XS module on this page.
If you are the Build & Release admin, I'd suggest storing the orig input plus actions taken & results in a simple MySQL DB, so you can track for your & your mgr's benefit,
unless you think you are dealing with a trivial amt of work.
You could use this to automate builds perhaps... & track dependencies...
 
Old 02-09-2006, 04:50 AM   #9
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,298

Rep: Reputation: 175Reputation: 175
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
 
Old 02-09-2006, 09:06 AM   #10
Michael_S
Member
 
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 78

Rep: Reputation: 23
Quote:
Originally Posted by bigearsbilly
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
Can you escape the commas with a backslash or some other character?
 
Old 02-09-2006, 09:32 AM   #11
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,298

Rep: Reputation: 175Reputation: 175
I dunno. Wasn't really a priority so I didn't go further.

I think Open Office was okay with it though if i remember.
 
Old 02-24-2006, 09:11 AM   #12
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,298

Rep: Reputation: 175Reputation: 175
Well. If you are still interested.
Have been parsing a spreadsheet today.
My excel also does XML output and something called IDF.

But, I found the best results where to save->as "tab delimited text" then
you almost certainly won't have a delimiter tab in the data itself.
(tab switches to next cell usually so you can't do it normally)
then you can easily split the data.

Of course for a simple sheet this works.
 
Old 02-24-2006, 12:45 PM   #13
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,143

Rep: Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638Reputation: 638
my experience is with dumping data to a csv and parsing it with shell script or 'c'.

csv is better because it is portable on any os. (because it is straight ascii text as opposed to proprietary bin format)

i think i heard a cammand line xls2csv program but not sure how/ if it works.

there are probably ways to escape ',' and '/' in the script but i found it much easier never to use those charecters in excel.

sorry billy, i'm not a perl fan as you are -you seem to post a perl solution every post (j/k - just kidding) but at least i read them.

for csv awk will be your friend.

also excel tends to auto format cells with preceding 0's or numbers not stored in $$$. format so i format everything as text.

good luck.
 
Old 02-28-2006, 04:44 AM   #14
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,298

Rep: Reputation: 175Reputation: 175
it all boils down to "know your data, love your data" (maybe not love)

perl moi?
I used to hate perl, now I love it. (Same with vi)
After a few months with it I absolutely adore it.

Try it, persevere, it's absolutely fab.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
URL parsing program in perl ludeKing Programming 3 09-30-2010 04:58 PM
Parsing HTML using Perl smaida Programming 2 05-29-2004 02:20 PM
parsing a logfile in perl ludeKing Programming 2 04-30-2004 04:20 AM
Perl Parsing filename The Jesus Programming 4 10-03-2003 04:21 PM
perl DBD::CSV pk21 Programming 0 09-26-2003 05:37 AM


All times are GMT -5. The time now is 08:18 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration