Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
02-01-2006, 06:32 AM
|
#1
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Rep:
|
Parsing XLS or CSV in Perl - what and what not
Hi Everyone!!
You might be aware of "use SpreadSheet::ParseExcel" thing in Perl. I am not pro on that/Perl but want to understand this.. what we shall prefer -- a CSV file or a plain XLS file for reading the parsing the data. What are the pros and cons of using CSV's over XLS; where we shall prefer them over XLS and where not.
Thanks for your comments/thoughts in advance...
|
|
|
02-01-2006, 01:37 PM
|
#2
|
Member
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 87
Rep:
|
Can you be more specific as to what you're trying to do?
I haven't used any software that parses XLS files (except for Excel itself, naturally).
Comma-Separated Value files are easy to use and manipulate, but obviously they lack some things that are built into the XLS file format. The company I work for has databases for standalone applications at sites, and we extract data from each table as .csv files, zip the files, encrypt the zip archive, FTP it to the main office, and run an application to unencrypt, extract, and integrate the data into the main database.
Our application uses Java just because all of the other software we use is Java. But each step in the process would be easy to do with Perl.
|
|
|
02-01-2006, 10:23 PM
|
#3
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.x
Posts: 18,434
|
Concur with Michael_S; are these spreadsheets basically data or do you want to be able to handle spreadsheets with lots of formulae/macros etc?
Most tools eg DBs have an 'import as CSV' option for data, so I'd go with that if possible.
The other issue is that different versions of Excel may need to be parsed differently for complex spreradsheets, as MS changes things (ie similar to the .doc 'std' changes gradually)
|
|
|
02-02-2006, 10:10 PM
|
#4
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Original Poster
Rep:
|
Ok!
So here's the deal; I am a Build and Release Admin, the developer put-into it varied information like CVS version of code file, location, domain it belongs to and DB schema the code needs to get released to etc. I dont need Macro's and all (I didnt faced any issues as of now with different version of MSExcel). But recently, an issue came up -- a developer typed in CVS version as 1.10 and it got translated to "1.1".
So wanted to understand whether these kind of "automatic" manipulations occur in CSV's as well. Quite obviously, I am thinking of three approaches:
#1. I shall make my code better to take care of such small issues; (though I am not very comfortable as an error can come my way then some other wil rise up on later date)
#2. Find out if there are some filters that can be put on format of information that gets fed into these spreadsheets; but again, how to ensure that it gets read as is and no translation happens on top of it :-)
#3. I switch to CVS from XLS -- but not sure about what will be the impact on my code -- will I still be able to use "use SpreadExcel::ParseExcel". Moreover, what will be the impact on overall performance of my code. If it doesnt matter much; then what are the things to watch out for when working with CSV's. If you have some URL or some doc on this; then plz plz share it.
Again, much thanks for your thoughts!!
- The Apprentice
|
|
|
02-03-2006, 10:49 AM
|
#5
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
Quote:
developer typed in CVS version as 1.10 and it got translated to "1.1".
|
surely a case of the wrong format for a column?
Quote:
whether these kind of "automatic" manipulations occur in CSV's as well
|
depends on the translator I'd say.
In my experience excel is absolutely *crap* with CSV.
Check the data carefully.
It's really brain dead with commas even inside quotes.
|
|
|
02-03-2006, 03:09 PM
|
#6
|
Member
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 87
Rep:
|
Do you mean CSV as in Comma Seperated Value, or CVS, as in Concurrent Versioning System? Build and release tracking sounds like CVS, not CSV.
If you are talking about CSV, then I'm not sure I can help. I think the way data is exported or imported is pretty dependent upon the program you use.
|
|
|
02-04-2006, 12:07 AM
|
#7
|
LQ Newbie
Registered: Feb 2006
Posts: 4
Original Poster
Rep:
|
Michael, you gave it a thought, that itself honoured my question. Sorry for the typo; I meant Comma Seperated Files only (also, we are using CVS tool just like any other development project for maintaining the document/code repositories).
Bigearsbilly, it would be really good for this community (and me) if you can list down things that one shall be careful about (from testing perspective) when migrating to CSV's from XLS.
Thanks!!
- The Apprentice
|
|
|
02-07-2006, 10:36 PM
|
#8
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.x
Posts: 18,434
|
If you'd like a DB style approach, try this: http://search.cpan.org/~jzucker/DBD-...dle/DBD/CSV.pm
or
http://search.cpan.org/~tmtm/Text-CS.../CSV/Simple.pm, & see the note about Text::CSV_XS module on this page.
If you are the Build & Release admin, I'd suggest storing the orig input plus actions taken & results in a simple MySQL DB, so you can track for your & your mgr's benefit,
unless you think you are dealing with a trivial amt of work.
You could use this to automate builds perhaps... & track dependencies...
|
|
|
02-09-2006, 03:50 AM
|
#9
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
|
|
|
02-09-2006, 08:06 AM
|
#10
|
Member
Registered: Oct 2004
Location: Pennsylvania, USA
Distribution: Debian
Posts: 87
Rep:
|
Quote:
Originally Posted by bigearsbilly
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
|
Can you escape the commas with a backslash or some other character?
|
|
|
02-09-2006, 08:32 AM
|
#11
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
I dunno. Wasn't really a priority so I didn't go further.
I think Open Office was okay with it though if i remember.
|
|
|
02-24-2006, 08:11 AM
|
#12
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
Well. If you are still interested.
Have been parsing a spreadsheet today.
My excel also does XML output and something called IDF.
But, I found the best results where to save->as "tab delimited text" then
you almost certainly won't have a delimiter tab in the data itself.
(tab switches to next cell usually so you can't do it normally)
then you can easily split the data.
Of course for a simple sheet this works.
|
|
|
02-24-2006, 11:45 AM
|
#13
|
LQ Guru
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,326
|
my experience is with dumping data to a csv and parsing it with shell script or 'c'.
csv is better because it is portable on any os. (because it is straight ascii text as opposed to proprietary bin format)
i think i heard a cammand line xls2csv program but not sure how/ if it works.
there are probably ways to escape ',' and '/' in the script but i found it much easier never to use those charecters in excel.
sorry billy, i'm not a perl fan as you are -you seem to post a perl solution every post (j/k - just kidding) but at least i read them.
for csv awk will be your friend.
also excel tends to auto format cells with preceding 0's or numbers not stored in $$$.¢¢ format so i format everything as text.
good luck.
|
|
|
02-28-2006, 03:44 AM
|
#14
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
it all boils down to "know your data, love your data" (maybe not love)
perl moi?
I used to hate perl, now I love it. (Same with vi)
After a few months with it I absolutely adore it.
Try it, persevere, it's absolutely fab.
|
|
|
All times are GMT -5. The time now is 04:26 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|