LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 03-26-2008, 02:02 PM   #1
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Rep: Reputation: 31
xls2csv delimiter


i found a great tool xls2csv to convert MS excel sheets to csv files which I'm using to insert into a database.

I'm having a slight problem with one excel sheet which includes commas in the fields. Because of this, the CSV file is damaged and is not reliable to be inserted into a database. So the solution is to change the delimiter in xls2csv. According to man:
Code:
xls2csv(1)                                                          xls2csv(1)

NAME
       xls2csv  -  reads MS-Excel file and puts its content as comma-separated
       data on standard output

SYNOPSIS
       xls2csv [-xlV] [-f  format ] [-b  string ] [-s  charset ] [-d   charset
       ] [-q  number ] [-c  char] files

DESCRIPTION
       xls2csv reads MS-Excel spreadsheet and dumps its content as comma-sepa‚
       rated values to stdout. Numbers are printed without delimiters, strings
       are enclosed in the double quotes. Double-quotes inside string are dou‚
       bled.

OPTIONS
       -x      print unknown Unicode chars as \xNNNN, rather than as  question
               marks

       -l      list known charsets and exit successfully

       -cchar  cell separator char. By default - comma.
 Manual page xls2csv(1) line 1
so i should be able to use the "-cchar" option to change the delimiter. in addition "-c" option is for the destination CSV file. Please look below:
Code:
NG:~# xls2csv

xls2csv - Recode a spreadsheet's charset and save as CSV.

usage: xls2csv -x spreadsheet.xls [-w worksheet] [-b charset] [-c csvfile.csv] [-a charset] [-qshvW]

-x  : filename of the source spreadsheet
-b  : the character set the source spreadsheet is in (before)
-c  : the filename to save the generated csv file as
-a  : the character set the csv file should be converted to (after)
-q  : quiet mode
-s  : print a list of supported character sets
-h  : this help message
-v  : get version information
-W  : list worksheets in the spreadsheet specified by -x
-w  : specify the worksheet name to convert (defaults to the first worksheet)

example: xls2csv -x "spreadsheet.xls" -b WINDOWS-1252 -c "csvfile.csv" -a UTF-8

More detailed help is in "perldoc xls2csv"
Now here is the problem:

Code:
NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".
so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.
 
Old 03-26-2008, 05:33 PM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by waelaltaqi View Post
i
Code:
NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".
so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.
The argument '-cchar' would seem to have caused it to do what it did.
--- rod.
 
Old 03-26-2008, 06:28 PM   #3
michaelk
Moderator
 
Registered: Aug 2002
Posts: 12,165

Rep: Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784
Doing a quick search there appears to be different options between the different versions. One version uses -c for a delimiter character and the other uses -c for the output filename.

http://www.linuxmanpages.com/man1/xls2csv.1.php
vs
http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv

The first link shows -cchar which IMHO means that -c is the option and not -cchar. Your version shows -c is the output file name. Not sure what your options are at this point. Might be able to use sed to fix the data.
 
Old 03-27-2008, 10:26 AM   #4
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Original Poster
Rep: Reputation: 31
thanks michaelk for pointing me on the right direction. fixing the problem with sed could be done if was to conver one sheet. But it's going to be very time consuming if I wanted to examine 5 excel sheets a day and figure what a sed command that will do for each one. The excel sheets are different will be received daily and must be converted to CSV then inserted in SQL.

i searched all over the place for xls2csv 0.93.3but i couldn't find it. CPAN doesn't have it and i don't know of a site that have old perl modules. Any ideas or should i just search for another utility?

Last edited by waelaltaqi; 03-27-2008 at 10:29 AM.
 
Old 03-27-2008, 02:35 PM   #5
michaelk
Moderator
 
Registered: Aug 2002
Posts: 12,165

Rep: Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784Reputation: 784
Might just want to look for another utility. I found this:
http://wizard.ae.krakow.pl/~jb/xls2txt/
 
Old 03-27-2008, 11:05 PM   #6
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Original Poster
Rep: Reputation: 31
works

I found this guy somewhere and he works beautifully. The utility requires Spreadsheet::ParseExcel.

Code:
#!/usr/bin/perl 
use strict;
use warnings;

#  Declarations
#===========================================================
use Spreadsheet::ParseExcel;

#  Program proper
#===========================================================

my $file_name = shift || usage();

parse_excel($file_name);

exit 0;

#  Subroutines
#===========================================================
sub parse_excel {

    my %arg = ( record_sep => "\n",
                field_sep  => "\t",
                xls        => undef );

    if ( @_ == 1 ) {
        $arg{xls} = shift;
    } elsif ( not @_ % 2 ) {
        %arg = ( %arg, @_ );
    }

    -e $arg{xls} or 
        die "Must provide valid XLS file! $arg{xls}, $!\n";

    # create a ParseExcel object
    my $excel_obj = Spreadsheet::ParseExcel->new();

    my $workbook = $excel_obj->Parse($arg{xls});

    # make sure we're in business
    die "Workbook did not return worksheets!\n"
        unless ref $workbook->{Worksheet} eq 'ARRAY';

    # we need to get each worksheet from the workbook in turn

    for my $worksheet ( @{$workbook->{Worksheet}} ) {

    # {Cells}[row][col]
    # empty worksheets have undef for MaxCol and MaxRow
        my $last_col = $worksheet->{MaxCol} || 0;
        my $last_row = $worksheet->{MaxRow} || 0;

        for my $row ( 0 .. $last_row ) {

            for my $col ( 0 .. $last_col ) {

                my $cell = $worksheet->{Cells}[$row][$col];

                print ref $cell ?
                    $cell->Value : '';

                print $arg{field_sep} unless $col == $last_col;
            }
            print $arg{record_sep}; # record ends
        }
        print "\n"; # worksheet ends
    }
}
#===========================================================
sub usage {

    my ( $tool ) = $0 =~ m,([^\/]+$),;

    print <<HERE;
------------------------------------------------------------
USAGE:

   $tool EXCEL_FILE.xls [field_delim] [record_delim]

Takes an Excel file, parses it into plain text delimited 
fields and rows, and sends the results to STDOUT. The default 
field and record delimiters are "\\t" and "\\n" if neither 
is given.
------------------------------------------------------------
HERE
    exit 0;
}
#===========================================================
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
change for-loop delimiter character? galle Programming 6 08-19-2009 11:15 AM
Divide up lines with string delimiter elmu Programming 3 10-07-2005 09:48 AM
c++ get() delimiter ashirazi Programming 3 08-06-2004 06:26 AM
sed with delimiter, bourne uribo Programming 17 04-23-2003 02:42 AM
tab delimiter codename000 Programming 3 04-04-2003 11:18 AM


All times are GMT -5. The time now is 07:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration