Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
04-21-2010, 05:28 AM
|
#1
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Rep:
|
Need more efficient script PDF417 parsing
I wrote this
Code:
cat $1 | sed -e 's/[ ][ ]*/,/g' | sed 's/,/,\n/g' | sed -e '$s/.....$//' - >> processing
cat processing | awk 'NR==1' | sed 's/^........//' >> processed
cat processing | awk 'NR==2,NR==100' | sed 's/^....//' >> processed
mv -i processed $1.csv
to process a particularly formatted PDF417 file.
It is formatted like this
$CENTAUR<code_length_8><batch_length_20><expiry_length_8><quantity_length_4>$
So a typical record looks like this
<code_length_8><batch_length_20><expiry_length_8><quantity_length_4>
(note, no spaces between each record or field if the field is present, otherwise white space for the length specified).
A file will never have more than 22 records.
The files I have been processing so far only have the quantity and code fields, so everything else is whitespace and my script doesn't take the other fields into account. I would like it to make a csvfile like this:
$CENTAUR<LF>
<code_length_8>,<batch_length_20>,<expiry_length_8>,<quantity_length_4><LF>
<code_length_8>,<batch_length_20>,<expiry_length_8>,<quantity_length_4><LF>
$<LF>
<LF> = linefeed/new line
<> separate fields only to show the format and are not actually present in the barcode output
It is actually a file scanned directly from a barcode output - so the PDF417 barcode is scanned and the file is saved in notepad++ - can it be scanned directly into the script?
example file - not sure how much formatting will be retained on the forum.
Code:
$CENTAUR30298309 000130287018 000130318905 000130295355 000130295344 000130295333 000130209138 000130210705 000130217293 000130273352 000130292823 000130292834 000130293065 000130293076 000130293087 000130293000 000130293010 000130293021 000130292415 000130292426 000130292947 000130292958 0001$
There is white space in the example as some of the fields are not present - only the quantity and code are present here
Last edited by suse_nerd; 04-21-2010 at 07:22 AM.
|
|
|
|
04-21-2010, 05:35 AM
|
#2
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
I wrote this
[code]
cat $1 | sed -e 's/[ ][ ]*/,/g' | sed 's/,/,\n/g' | sed -e '$s/.....$//' - >> processing
cat processing | awk 'NR==1' | sed 's/^........//' >> processed
cat processing | awk 'NR==2,NR==100' | sed 's/^....//' >> processed
mv -i processed $1.csv
[code]
to process a particularly formatted PDF417 file.
It is formatted like this
$CENTAUR<code_length_8><batch_length_20><expiry_length_8><quantity_length_4><code_length_8>....001$
(note, no spaces between each record or field) n
The files I have been processing so far only have the quantity and code fields, so everything else is whitespace and my script doesn't take the other fields into account. I would like it to make a csvfile like this:
$CENTAUR
<code_length_8>,<batch_length_20>,<expiry_length_8>,<quantity_length_4>
<code_length_8>,<batch_length_20>,<expiry_length_8>,<quantity_length_4>
....
001$
It is actually a file scanned directly from a barcode output - so the PDF417 barcode is scanned and the file is saved in notepad++ - can it be scanned directly into the script?
example file - not sure how much formatting will be retained on the forum.
Code:
$CENTAUR30298309 000130287018 000130318905 000130295355 000130295344 000130295333 000130209138 000130210705 000130217293 000130273352 000130292823 000130292834 000130293065 000130293076 000130293087 000130293000 000130293010 000130293021 000130292415 000130292426 000130292947 000130292958 0001$
|
There is whole bunch of Perl PDF related (including parsing) modules: http://search.cpan.org/search?query=PDF&mode=all .
|
|
|
|
04-21-2010, 05:45 AM
|
#3
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Quote:
Originally Posted by Sergei Steshenko
|
Thanks, but this is about PDF417 barcodes, rather than PDF files.
|
|
|
|
04-21-2010, 06:30 AM
|
#4
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
Thanks, but this is about PDF417 barcodes, rather than PDF files.
|
Your example shows whitespaces, but you wrote: "(note, no spaces between each record or field)".
Anyway, are fields of the matter of constant width ? If yes, in Perl constant width fields can be extracted by regular expression like this:
Code:
if($line =~ m/^(.{3})(.{5})(.{8})/)
{
print "\$1=$1 $2=$2 \$3=$3"; # $1 contains the first 3 characters, $2 contain the following 5 characters, $3 contains the following 8 charactes
}
Or Perl 'substr' function can be used.
|
|
|
1 members found this post helpful.
|
04-21-2010, 06:36 AM
|
#5
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Hello,
Yes, to claify
There is no whitespace if all fields are present.
If fields are missing, like in my example, there are whitespaces to the length of the missing fields.
So would something like this work (after stripping the header and footer)
Code:
while(<STDIN>)
{
if($line =~ m/^(.{8})(.{20})(.{8})(.{4})/)
{print "\$1=$1\,\$2=$2\,\$3=$3\,\$4=$4\n";}
}
Note the header is
$CENTAUR
and the footer is just
$
not 001$
Last edited by suse_nerd; 04-21-2010 at 07:00 AM.
|
|
|
|
04-21-2010, 07:10 AM
|
#6
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,383
|
Based on your example:
Code:
$CENTAUR30298309 000130287018 ...
Would you show the desired output?
|
|
|
1 members found this post helpful.
|
04-21-2010, 07:18 AM
|
#7
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
The output needs to be captured as a file, yes. Ideally should prompt for a filename.
If the input is coming directly from the barcode scanner, this functionality would be ideal.
Output needs to be in a CSV-style format, see my OP, I have made some changes/updates to it.
|
|
|
|
04-21-2010, 07:26 AM
|
#8
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
Hello,
Yes, to claify
There is no whitespace if all fields are present.
If fields are missing, like in my example, there are whitespaces to the length of the missing fields.
So would something like this work (after stripping the header and footer)
Code:
while(<STDIN>)
{
if($line =~ m/^(.{8})(.{20})(.{8})(.{4})/)
{print "\$1=$1\,\$2=$2\,\$3=$3\,\$4=$4\n";}
}
Note the header is
$CENTAUR
and the footer is just
$
not 001$
|
Just replace
with
Code:
while(defined(my $line = <STDIN>))
.
After you have all the fields you can check them for being whitespaces only.
If the header is of constant width, it can simply be considered an unneeded field. I.e. use something like
Code:
if($line =~ m/^.{10}(.{8})(.{20})(.{8})(.{4})/)
where '10' is the the header width - since there are no parenthesis, the header won't be captured the $N variables.
|
|
|
1 members found this post helpful.
|
04-21-2010, 07:37 AM
|
#9
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Quote:
Originally Posted by Sergei Steshenko
{print "\$1=$1\,\$2=$2\,\$3=$3\,\$4=$4\n";}
|
Will this syntax definitely work? I need the commas between each field and the new line at the end.
I assume it will just discard the $ at the end of the file.
Perl script runs, but doesn't output anything -how do I get it to output?
Edit: There is no carriage returns, everything is output on one line - is this script expecting a LF or CR? Also it only needs to skip the header once for each output - will this work?
Something like this?
Code:
#!/usr/bin/perl
open (MYFILE, '>>data.txt');
while(defined(my $line = <STDIN>))
{
if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})/)
{print MYFILE "\$1=$1\,\$2=$2\,\$3=$3\,\$4=$4\n";}
}
close (MYFILE);
Still not outputting anything though.
Last edited by suse_nerd; 04-21-2010 at 08:31 AM.
|
|
|
|
04-21-2010, 08:39 AM
|
#10
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
Will this syntax definitely work? I need the commas between each field and the new line at the end.
I assume it will just discard the $ at the end of the file.
Perl script runs, but doesn't output anything -how do I get it to output?
Edit: There is no carriage returns, everything is output on one line - is this script expecting a LF or CR? Also it only needs to skip the header once for each output - will this work?
Something like this?
Code:
#!/usr/bin/perl
open (MYFILE, '>>data.txt');
while(defined(my $line = <STDIN>))
{
if($line =~ m/^.{8}(.{8})(.{20})(.{8})(.{4})/)
{print MYFILE "\$1=$1\,\$2=$2\,\$3=$3\,\$4=$4\n";}
}
close (MYFILE);
Still not outputting anything though.
|
First and foremost - put
Code:
use strict;
use warnings;
just after
.
You do not need backslashes before commas.
You need to debug the script - first make sure the lines are indeed read from STDIN, for this just before the 'if' statement put
Code:
warn "\$line=$line";
.
|
|
|
1 members found this post helpful.
|
04-21-2010, 09:16 AM
|
#11
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Quote:
Originally Posted by Sergei Steshenko
.
|
I got it to do it for the first field only, how do you get it do loop until the end of the input?
Last edited by suse_nerd; 04-21-2010 at 09:21 AM.
|
|
|
|
04-21-2010, 09:22 AM
|
#12
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
Code:
$perl out.pl
$CENTAUR16124319 09082011000130001705 2309
2011000130193694 20042010000130209998 2004
2010000130213907 31012012000130217602 0109
2011000130217613 11092011000130222883 1901
2012000130226217 160420120001302355020 3105
2011000130237348 20042010000130237359 2004
2010000130238544 12082011000130238566 2004
2010000130242020 20042010000130278571 2004
2010000130280336 20042010000130288291 0902
2011000130288316 01072010000130288327 1201
2011000130291955 20042010000130293542 2004
20100001$
$line=$CENTAUR16124319 09082011000130001705
23092011000130193694 20042010000130209998
20042010000130213907 31012012000130217602
01092011000130217613 11092011000130222883
19012012000130226217 160420120001302355020
31052011000130237348 20042010000130237359
20042010000130238544 12082011000130238566
20042010000130242020 20042010000130278571
20042010000130280336 20042010000130288291
09022011000130288316 01072010000130288327
12012011000130291955 20042010000130293542
200420100001$
This point the script just waits and doesn't do anything. If I kill the program data.txt is empty.
|
And why shouldn't it wait ?
If you are invoking it as
, then it waits for something to come from STDIN, i.e. keyboard. Why won't you feed data into script using pipe or redirection ?
For that matter, I do not understand why/how the script managed to print even one $line.
|
|
|
1 members found this post helpful.
|
04-21-2010, 09:42 AM
|
#13
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Quote:
Originally Posted by Sergei Steshenko
For that matter, I do not understand why/how the script managed to print even one $line.
|
I think because it's waiting for a new line, so when i press the enter key, it outputs the value of $line as specified in the warn statement.
I open and closed the file within the if statement and now it outputs to file properly. However, it seems only works on the first field and doesn't loop the regex search/parse until the end of the line.
So I get something like:
Code:
30293644, ,20042010,0001
|
|
|
|
04-21-2010, 10:12 AM
|
#14
|
|
Senior Member
Registered: May 2005
Posts: 4,418
|
Quote:
Originally Posted by suse_nerd
I think because it's waiting for a new line, so when i press the enter key, it outputs the value of $line as specified in the warn statement.
I open and closed the file within the if statement and now it outputs to file properly. However, it seems only works on the first field and doesn't loop the regex search/parse until the end of the line.
So I get something like:
Code:
30293644, ,20042010,0001
|
Why won't you yo do two things:
- chmod +x your_script.pl
- ./your_script.pl < input_data_file
?
Adjust path to your script and to your input data file as needed in the above.
Last edited by Sergei Steshenko; 04-21-2010 at 10:15 AM.
|
|
|
|
04-21-2010, 10:32 AM
|
#15
|
|
Member
Registered: May 2008
Distribution: SuSe
Posts: 50
Original Poster
Rep:
|
Quote:
Originally Posted by Sergei Steshenko
Why won't you yo do two things:
- chmod +x your_script.pl
- ./your_script.pl < input_data_file
?
Adjust path to your script and to your input data file as needed in the above.
|
Yeah, but I was hoping to be able to take input directly from the barcode scanner, not a file.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 05:31 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|