LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-26-2008, 01:02 PM   #1
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Rep: Reputation: 31
xls2csv delimiter


i found a great tool xls2csv to convert MS excel sheets to csv files which I'm using to insert into a database.

I'm having a slight problem with one excel sheet which includes commas in the fields. Because of this, the CSV file is damaged and is not reliable to be inserted into a database. So the solution is to change the delimiter in xls2csv. According to man:
Code:
xls2csv(1)                                                          xls2csv(1)

NAME
       xls2csv  -  reads MS-Excel file and puts its content as comma-separated
       data on standard output

SYNOPSIS
       xls2csv [-xlV] [-f  format ] [-b  string ] [-s  charset ] [-d   charset
       ] [-q  number ] [-c  char] files

DESCRIPTION
       xls2csv reads MS-Excel spreadsheet and dumps its content as comma-sepaâ
       rated values to stdout. Numbers are printed without delimiters, strings
       are enclosed in the double quotes. Double-quotes inside string are douâ
       bled.

OPTIONS
       -x      print unknown Unicode chars as \xNNNN, rather than as  question
               marks

       -l      list known charsets and exit successfully

       -cchar  cell separator char. By default - comma.
 Manual page xls2csv(1) line 1
so i should be able to use the "-cchar" option to change the delimiter. in addition "-c" option is for the destination CSV file. Please look below:
Code:
NG:~# xls2csv

xls2csv - Recode a spreadsheet's charset and save as CSV.

usage: xls2csv -x spreadsheet.xls [-w worksheet] [-b charset] [-c csvfile.csv] [-a charset] [-qshvW]

-x  : filename of the source spreadsheet
-b  : the character set the source spreadsheet is in (before)
-c  : the filename to save the generated csv file as
-a  : the character set the csv file should be converted to (after)
-q  : quiet mode
-s  : print a list of supported character sets
-h  : this help message
-v  : get version information
-W  : list worksheets in the spreadsheet specified by -x
-w  : specify the worksheet name to convert (defaults to the first worksheet)

example: xls2csv -x "spreadsheet.xls" -b WINDOWS-1252 -c "csvfile.csv" -a UTF-8

More detailed help is in "perldoc xls2csv"
Now here is the problem:

Code:
NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".
so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.
 
Old 03-26-2008, 04:33 PM   #2
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by waelaltaqi View Post
i
Code:
NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".
so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.
The argument '-cchar' would seem to have caused it to do what it did.
--- rod.
 
Old 03-26-2008, 05:28 PM   #3
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,702

Rep: Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896
Doing a quick search there appears to be different options between the different versions. One version uses -c for a delimiter character and the other uses -c for the output filename.

http://www.linuxmanpages.com/man1/xls2csv.1.php
vs
http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv

The first link shows -cchar which IMHO means that -c is the option and not -cchar. Your version shows -c is the output file name. Not sure what your options are at this point. Might be able to use sed to fix the data.
 
Old 03-27-2008, 09:26 AM   #4
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Original Poster
Rep: Reputation: 31
thanks michaelk for pointing me on the right direction. fixing the problem with sed could be done if was to conver one sheet. But it's going to be very time consuming if I wanted to examine 5 excel sheets a day and figure what a sed command that will do for each one. The excel sheets are different will be received daily and must be converted to CSV then inserted in SQL.

i searched all over the place for xls2csv 0.93.3but i couldn't find it. CPAN doesn't have it and i don't know of a site that have old perl modules. Any ideas or should i just search for another utility?

Last edited by waelaltaqi; 03-27-2008 at 09:29 AM.
 
Old 03-27-2008, 01:35 PM   #5
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,702

Rep: Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896Reputation: 5896
Might just want to look for another utility. I found this:
http://wizard.ae.krakow.pl/~jb/xls2txt/
 
Old 03-27-2008, 10:05 PM   #6
waelaltaqi
Member
 
Registered: Sep 2005
Location: USA, TN
Distribution: CentOS & Ubuntu for Desktop
Posts: 454

Original Poster
Rep: Reputation: 31
works

I found this guy somewhere and he works beautifully. The utility requires Spreadsheet::ParseExcel.

Code:
#!/usr/bin/perl 
use strict;
use warnings;

#  Declarations
#===========================================================
use Spreadsheet::ParseExcel;

#  Program proper
#===========================================================

my $file_name = shift || usage();

parse_excel($file_name);

exit 0;

#  Subroutines
#===========================================================
sub parse_excel {

    my %arg = ( record_sep => "\n",
                field_sep  => "\t",
                xls        => undef );

    if ( @_ == 1 ) {
        $arg{xls} = shift;
    } elsif ( not @_ % 2 ) {
        %arg = ( %arg, @_ );
    }

    -e $arg{xls} or 
        die "Must provide valid XLS file! $arg{xls}, $!\n";

    # create a ParseExcel object
    my $excel_obj = Spreadsheet::ParseExcel->new();

    my $workbook = $excel_obj->Parse($arg{xls});

    # make sure we're in business
    die "Workbook did not return worksheets!\n"
        unless ref $workbook->{Worksheet} eq 'ARRAY';

    # we need to get each worksheet from the workbook in turn

    for my $worksheet ( @{$workbook->{Worksheet}} ) {

    # {Cells}[row][col]
    # empty worksheets have undef for MaxCol and MaxRow
        my $last_col = $worksheet->{MaxCol} || 0;
        my $last_row = $worksheet->{MaxRow} || 0;

        for my $row ( 0 .. $last_row ) {

            for my $col ( 0 .. $last_col ) {

                my $cell = $worksheet->{Cells}[$row][$col];

                print ref $cell ?
                    $cell->Value : '';

                print $arg{field_sep} unless $col == $last_col;
            }
            print $arg{record_sep}; # record ends
        }
        print "\n"; # worksheet ends
    }
}
#===========================================================
sub usage {

    my ( $tool ) = $0 =~ m,([^\/]+$),;

    print <<HERE;
------------------------------------------------------------
USAGE:

   $tool EXCEL_FILE.xls [field_delim] [record_delim]

Takes an Excel file, parses it into plain text delimited 
fields and rows, and sends the results to STDOUT. The default 
field and record delimiters are "\\t" and "\\n" if neither 
is given.
------------------------------------------------------------
HERE
    exit 0;
}
#===========================================================
 
Old 08-04-2017, 05:12 AM   #7
stefvienna
LQ Newbie
 
Registered: Aug 2017
Posts: 1

Rep: Reputation: Disabled
Hi,

not sure whether you're still looking for an answer, but I ran ingto the very same issue - and was able to solve it.
I've pulled down version 0.7.3 from GitHub and here you are able to specify the delimiter via the "-d <delim>" Option, e.g. to set the delimiter to semicolon you would use:
xlsx2csv --ignoreempty --skipemptycolumns --sheet 7 -d ";" Mappings.xlsx Mappings.csv

Hope that helps!
Cheers
 
Old 08-05-2017, 10:48 PM   #8
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
I thought maybe Excel would work, because it has an export function. The other way to do it is to export to text of fixed field length, and then specify the size of each field for the import.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
change for-loop delimiter character? galle Programming 6 08-19-2009 10:15 AM
Divide up lines with string delimiter elmu Programming 3 10-07-2005 08:48 AM
c++ get() delimiter ashirazi Programming 3 08-06-2004 05:26 AM
sed with delimiter, bourne uribo Programming 17 04-23-2003 01:42 AM
tab delimiter codename000 Programming 3 04-04-2003 10:18 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration