Hi gurus!
I have this script...
Code:
#!/bin/bash
LIST='ls *.pdf'
for i in $LIST ;
#for file in originals/*.pdf
do
gs -r300x300 -sDEVICE=tiffgray -sOutputFile=albara.tif -dBATCH -dNOPAUSE $i
tesseract albara.tif albara -l spa
cat albara.txt | grep -A 1 Data | tail -1 | cut -c 1-10 | sed -e "s/.*/mv $i mod\/&.pdf/" > execute; sh execute && rm -f execute
rm albara.txt
rm albara.tif
done
1. the script list all PDF files in a directory
2. transform it to TIFF
3. recognizes all the text with tesseract
4. changes the original name with the first 10 characters
5. finally deletes the temp files
The problem is in the step number 4 and the 10 characters treatment. If the 2 lines 'grepped' are like this:
Code:
Número / Data
1001114166 x 10.11.2009
No problem, finish ok.
But if the two line are like this:
Code:
Número / Data
10111416 x 10.11.2009
No works!
How can I solve this?
Thank's!