LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   xls2csv delimiter (https://www.linuxquestions.org/questions/linux-newbie-8/xls2csv-delimiter-630827/)

waelaltaqi 03-26-2008 01:02 PM

xls2csv delimiter
 
i found a great tool xls2csv to convert MS excel sheets to csv files which I'm using to insert into a database.

I'm having a slight problem with one excel sheet which includes commas in the fields. Because of this, the CSV file is damaged and is not reliable to be inserted into a database. So the solution is to change the delimiter in xls2csv. According to man:
Code:

xls2csv(1)                                                          xls2csv(1)

NAME
      xls2csv  -  reads MS-Excel file and puts its content as comma-separated
      data on standard output

SYNOPSIS
      xls2csv [-xlV] [-f  format ] [-b  string ] [-s  charset ] [-d  charset
      ] [-q  number ] [-c  char] files

DESCRIPTION
      xls2csv reads MS-Excel spreadsheet and dumps its content as comma-sepaâ
      rated values to stdout. Numbers are printed without delimiters, strings
      are enclosed in the double quotes. Double-quotes inside string are douâ
      bled.

OPTIONS
      -x      print unknown Unicode chars as \xNNNN, rather than as  question
              marks

      -l      list known charsets and exit successfully

      -cchar  cell separator char. By default - comma.
 Manual page xls2csv(1) line 1

so i should be able to use the "-cchar" option to change the delimiter. in addition "-c" option is for the destination CSV file. Please look below:
Code:

NG:~# xls2csv

xls2csv - Recode a spreadsheet's charset and save as CSV.

usage: xls2csv -x spreadsheet.xls [-w worksheet] [-b charset] [-c csvfile.csv] [-a charset] [-qshvW]

-x  : filename of the source spreadsheet
-b  : the character set the source spreadsheet is in (before)
-c  : the filename to save the generated csv file as
-a  : the character set the csv file should be converted to (after)
-q  : quiet mode
-s  : print a list of supported character sets
-h  : this help message
-v  : get version information
-W  : list worksheets in the spreadsheet specified by -x
-w  : specify the worksheet name to convert (defaults to the first worksheet)

example: xls2csv -x "spreadsheet.xls" -b WINDOWS-1252 -c "csvfile.csv" -a UTF-8

More detailed help is in "perldoc xls2csv"

Now here is the problem:

Code:

NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".

so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.

theNbomr 03-26-2008 04:33 PM

Quote:

Originally Posted by waelaltaqi (Post 3101262)
i
Code:

NG:~/downloads# xls2csv -x "test.xls" -w "Sheet1" -c "test.csv" -cchar "~"
Now reading "test.xls" as UTF-8.
Converting the "Sheet1" worksheet.
The spreadsheet has been converted to UTF-8 and saved as "char".

so it named the output char rather than test.csv .
I'm not sure what i can do next to make it work. is there is a way to change the default delimiter for xls2char rather trying the above?
Any input will be highly regarded.

The argument '-cchar' would seem to have caused it to do what it did.
--- rod.

michaelk 03-26-2008 05:28 PM

Doing a quick search there appears to be different options between the different versions. One version uses -c for a delimiter character and the other uses -c for the output filename.

http://www.linuxmanpages.com/man1/xls2csv.1.php
vs
http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv

The first link shows -cchar which IMHO means that -c is the option and not -cchar. Your version shows -c is the output file name. Not sure what your options are at this point. Might be able to use sed to fix the data.

waelaltaqi 03-27-2008 09:26 AM

thanks michaelk for pointing me on the right direction. fixing the problem with sed could be done if was to conver one sheet. But it's going to be very time consuming if I wanted to examine 5 excel sheets a day and figure what a sed command that will do for each one. The excel sheets are different will be received daily and must be converted to CSV then inserted in SQL.

i searched all over the place for xls2csv 0.93.3but i couldn't find it. CPAN doesn't have it and i don't know of a site that have old perl modules. Any ideas or should i just search for another utility?

michaelk 03-27-2008 01:35 PM

Might just want to look for another utility. I found this:
http://wizard.ae.krakow.pl/~jb/xls2txt/

waelaltaqi 03-27-2008 10:05 PM

works
 
I found this guy somewhere and he works beautifully. The utility requires Spreadsheet::ParseExcel.

Code:

#!/usr/bin/perl
use strict;
use warnings;

#  Declarations
#===========================================================
use Spreadsheet::ParseExcel;

#  Program proper
#===========================================================

my $file_name = shift || usage();

parse_excel($file_name);

exit 0;

#  Subroutines
#===========================================================
sub parse_excel {

    my %arg = ( record_sep => "\n",
                field_sep  => "\t",
                xls        => undef );

    if ( @_ == 1 ) {
        $arg{xls} = shift;
    } elsif ( not @_ % 2 ) {
        %arg = ( %arg, @_ );
    }

    -e $arg{xls} or
        die "Must provide valid XLS file! $arg{xls}, $!\n";

    # create a ParseExcel object
    my $excel_obj = Spreadsheet::ParseExcel->new();

    my $workbook = $excel_obj->Parse($arg{xls});

    # make sure we're in business
    die "Workbook did not return worksheets!\n"
        unless ref $workbook->{Worksheet} eq 'ARRAY';

    # we need to get each worksheet from the workbook in turn

    for my $worksheet ( @{$workbook->{Worksheet}} ) {

    # {Cells}[row][col]
    # empty worksheets have undef for MaxCol and MaxRow
        my $last_col = $worksheet->{MaxCol} || 0;
        my $last_row = $worksheet->{MaxRow} || 0;

        for my $row ( 0 .. $last_row ) {

            for my $col ( 0 .. $last_col ) {

                my $cell = $worksheet->{Cells}[$row][$col];

                print ref $cell ?
                    $cell->Value : '';

                print $arg{field_sep} unless $col == $last_col;
            }
            print $arg{record_sep}; # record ends
        }
        print "\n"; # worksheet ends
    }
}
#===========================================================
sub usage {

    my ( $tool ) = $0 =~ m,([^\/]+$),;

    print <<HERE;
------------------------------------------------------------
USAGE:

  $tool EXCEL_FILE.xls [field_delim] [record_delim]

Takes an Excel file, parses it into plain text delimited
fields and rows, and sends the results to STDOUT. The default
field and record delimiters are "\\t" and "\\n" if neither
is given.
------------------------------------------------------------
HERE
    exit 0;
}
#===========================================================


stefvienna 08-04-2017 05:12 AM

Hi,

not sure whether you're still looking for an answer, but I ran ingto the very same issue - and was able to solve it.
I've pulled down version 0.7.3 from GitHub and here you are able to specify the delimiter via the "-d <delim>" Option, e.g. to set the delimiter to semicolon you would use:
xlsx2csv --ignoreempty --skipemptycolumns --sheet 7 -d ";" Mappings.xlsx Mappings.csv

Hope that helps!
Cheers

AwesomeMachine 08-05-2017 10:48 PM

I thought maybe Excel would work, because it has an export function. The other way to do it is to export to text of fixed field length, and then specify the size of each field for the import.


All times are GMT -5. The time now is 03:41 PM.