ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi Everyone!!
You might be aware of "use SpreadSheet::ParseExcel" thing in Perl. I am not pro on that/Perl but want to understand this.. what we shall prefer -- a CSV file or a plain XLS file for reading the parsing the data. What are the pros and cons of using CSV's over XLS; where we shall prefer them over XLS and where not.
Can you be more specific as to what you're trying to do?
I haven't used any software that parses XLS files (except for Excel itself, naturally).
Comma-Separated Value files are easy to use and manipulate, but obviously they lack some things that are built into the XLS file format. The company I work for has databases for standalone applications at sites, and we extract data from each table as .csv files, zip the files, encrypt the zip archive, FTP it to the main office, and run an application to unencrypt, extract, and integrate the data into the main database.
Our application uses Java just because all of the other software we use is Java. But each step in the process would be easy to do with Perl.
Concur with Michael_S; are these spreadsheets basically data or do you want to be able to handle spreadsheets with lots of formulae/macros etc?
Most tools eg DBs have an 'import as CSV' option for data, so I'd go with that if possible.
The other issue is that different versions of Excel may need to be parsed differently for complex spreradsheets, as MS changes things (ie similar to the .doc 'std' changes gradually)
Ok!
So here's the deal; I am a Build and Release Admin, the developer put-into it varied information like CVS version of code file, location, domain it belongs to and DB schema the code needs to get released to etc. I dont need Macro's and all (I didnt faced any issues as of now with different version of MSExcel). But recently, an issue came up -- a developer typed in CVS version as 1.10 and it got translated to "1.1".
So wanted to understand whether these kind of "automatic" manipulations occur in CSV's as well. Quite obviously, I am thinking of three approaches:
#1. I shall make my code better to take care of such small issues; (though I am not very comfortable as an error can come my way then some other wil rise up on later date)
#2. Find out if there are some filters that can be put on format of information that gets fed into these spreadsheets; but again, how to ensure that it gets read as is and no translation happens on top of it :-)
#3. I switch to CVS from XLS -- but not sure about what will be the impact on my code -- will I still be able to use "use SpreadExcel::ParseExcel". Moreover, what will be the impact on overall performance of my code. If it doesnt matter much; then what are the things to watch out for when working with CSV's. If you have some URL or some doc on this; then plz plz share it.
Again, much thanks for your thoughts!!
- The Apprentice
developer typed in CVS version as 1.10 and it got translated to "1.1".
surely a case of the wrong format for a column?
Quote:
whether these kind of "automatic" manipulations occur in CSV's as well
depends on the translator I'd say.
In my experience excel is absolutely *crap* with CSV.
Check the data carefully.
It's really brain dead with commas even inside quotes.
Michael, you gave it a thought, that itself honoured my question. Sorry for the typo; I meant Comma Seperated Files only (also, we are using CVS tool just like any other development project for maintaining the document/code repositories).
Bigearsbilly, it would be really good for this community (and me) if you can list down things that one shall be careful about (from testing perspective) when migrating to CSV's from XLS.
If you'd like a DB style approach, try this: http://search.cpan.org/~jzucker/DBD-...dle/DBD/CSV.pm
or http://search.cpan.org/~tmtm/Text-CS.../CSV/Simple.pm, & see the note about Text::CSV_XS module on this page.
If you are the Build & Release admin, I'd suggest storing the orig input plus actions taken & results in a simple MySQL DB, so you can track for your & your mgr's benefit,
unless you think you are dealing with a trivial amt of work.
You could use this to automate builds perhaps... & track dependencies...
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
Well, I find excel to be pathetic in how it translates CSVs. For a format it is supposed
to understand. In particular if you have a comma in a field, if it is in quotes it
should be ignored. But my excel is so thick you end up with a new column regardless.
V. irritating. So I end up stripping the commas out.
Can you escape the commas with a backslash or some other character?
Well. If you are still interested.
Have been parsing a spreadsheet today.
My excel also does XML output and something called IDF.
But, I found the best results where to save->as "tab delimited text" then
you almost certainly won't have a delimiter tab in the data itself.
(tab switches to next cell usually so you can't do it normally)
then you can easily split the data.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.