LinuxQuestions.org - need a little bash script help

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - need a little bash script help (https://www.linuxquestions.org/questions/programming-9/need-a-little-bash-script-help-524413/)

Elguapo

01-31-2007 12:13 PM

need a little bash script help

I am working on a shell script that reads a directory that is passed as an argument into the shell script. After it reads the directory, I need to check to see if the files in the directory are of the file type pdf. If file is a pdf, I need to convert the pdf to a tiff file via image magick. I have all of my logic figured out, but can't figure out the syntax of the entire bash script.

Logic.

List Directory in argument 1

loop directory listing

if file == pdf

convert using image magick

delete parent file

It is pretty simple logic and does exactly what I need it to do. However, as I said before I am stuck on the syntax of the script.

So far this is what I have.

#!/bin/bash
# covert pdf to tiff
#

directory=/home/myfolder

for file in $( find $directory -type F -name '*' | sort)
do
if file

When running the command, I am trying to make it so that I can pass teh directory var into the script. So it would look something like this if I were to run it from a command line.

pdfToTiff.sh -d /home/myfolder

Can someone point me in the right direction?

gctaylor1

01-31-2007 12:51 PM

The concept you're looking for is called positional parameters.

Here's a real simple example:

Code:

#!/bin/bash

echo "this is first -" $1

echo "this is second -" $2

echo "this is 3rd - " $3

echo "this is 4th - " $4

When run it looks like this:

Code:

$ ./position.sh a b c d

this is first - a

this is second - b

this is 3rd -  c

this is 4th -  d

Elguapo

01-31-2007 01:22 PM

I think I have the bash script mostly fleshed out. This is what I have so far.

directory=$1

for file in $( find $directory -type f -name '*' | sort)
do
if echo "$file" | grep -q '.pdf$';
then
/usr/bin/convert $file -quality 100 tif:$file.tif
rm $file
fi
done

The one problem that i can not figure out is this. when I am doing converting the file to a tiff file, I end up with a file that looks like this. testDoc1.pdf.tif when it should just be testDoc1.tif. I haven't been able to find a way to remove the extension from the output file. Does anyone know how I can do this?

b0uncer

01-31-2007 01:32 PM

A quick thought: you use find to find all files, then test each one if it ends with .pdf. Instead you could just use find to list only files that end with pdf (*.pdf) and then go trough them all; this way your list doesn't include those non-.pdf-files and the loop doesn't roll trough that many times (unless every file is a .pdf file). Another option would be to get all files listed like you do, and then use file command to determine if it's a PDF file (that way also those .pdf files get recognized and worked with that -- for some unknown reason -- do not end with .pdf altough they are pdfs). This may sound like an odd thing, but something could have accidentally renamed pdf files so they don't end with .pdf (like .pdf.2). I've seen that some KDE versions where you save a file and the name part of the file is painted for you (so you can just write a new name, and the suffix wouldn't change), a bug caused the name to be written partly after the suffix, and if you do that very quickly, the filename becomes something odd :) The main reason, however, why I recommend using file is that you can do it, and because on UNIX file suffixes don't matter as much as on Windows; you can name your pdf files any way you like. It just makes your script smarter.

You sure can use sed or awk or possibly something even more simpler to process the filename; check

Code:

man sed

or if that doesn't help,

Code:

man awk

Sed might be easier to use, awk is pretty much larger thing.

Franklin52

01-31-2007 02:52 PM

Try this:

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif

Regards

Elguapo

01-31-2007 05:30 PM

Using the code below worked like a charm. It no longer adds the .pdf extension to the file name.

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif

However I do have one final question before I get off this topic. During my tests with the script, I noticed that files with spaces or irregular chars in the file name, cause the script to error out. Is there a way that I can reformat the name of the file if it contains irregular chars.

Here is the code I am currently using.

Code:

directory=$1



for file in $( find $directory -type f -name '*' | sort)

        do

                if echo "$file" | grep -q '.pdf$'; 

                        then

                                /usr/bin/convert $file -quality 100 tif:${file%.*f}.tif

                                rm $file

                fi

        done

Ynot Irucrem

01-31-2007 09:16 PM

Enclose your filenames in quotes.
$file becomes "$file". not so sure about that funky one, but I would try "tif:${file%.*f}.tif" or tif:"${file%.*f}.tif"

Elguapo

02-01-2007 01:11 PM

I should have been more specific. Lets same someone sends me the pdf This Is a PDf.pdf. This would work find on a windows file system. But it does not work fine in a linux file system. When I try to parse the document with image magick I get the error unable to find file PDf.pdf.

So my question is this, how do I remove all the illegal characters from a file name if they exist. I have to make this a very broad array of illegal chars simply because of the large scope of people passing me documents.

Franklin52

02-01-2007 03:33 PM

You can remove the spaces with:

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ //g'

To replace the spaces with an underscore:

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ /_/g'

Regards

Elguapo

02-01-2007 04:56 PM

Quote:

Originally Posted by Franklin52

You can remove the spaces with:

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ //g'

To replace the spaces with an underscore:

Code:

/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ /_/g'

Regards

This would work in theory except that when I try to pass a file with the file name like Main Office network v2.pdf into the bash script it is going to error out.

From everything that I have tested, what I need to do is this. I need to take one directory listing, and loop over it. If any of the files have any of below chars in them I need to remove it all together. The first loop I need to do is where I am having the issues.

Code:

'''!"#$%&\'()*+,/:;<=>?@[\\]^_`{|}~ '''

Franklin52

02-02-2007 01:35 AM

This should remove all non printable characters:

Code:

tr -cd 'a-zA-Z0-9.'

Regards

unSpawn

02-02-2007 06:10 AM

Here's a wee rework of your script with explanation trying to use a maximum of Bash and a minimum of external utilities. If you want to see how it works uncomment the "set -x" line.

Code:

#!/bin/sh

#set -x

# Use whitelist instead.

declare -r scrub="1234567890-_./abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

# Determine my name for reporting.

progn=${0//*\//}; progn=${progn%.sh}

# Reporting subprocess exit status.

subexitval() { case "$?" in 0) f="-en"; r=OK;; *) unset f; r=ERR;; esac; 

 echo $f "$r"; if [ -z "$f" ]; then echo .; exit 1; fi; }

# Minimal check on supplied args.

case "$1" in h|-h|--help) echo "${progn}: help."; exit 1;;

*) if [ ! -d "$1" ]; then echo "${progn}: NODIR, exiting"; exit 1; fi;;

esac; directory="$1"

# "while" loop is "better".

find $directory -type f|while read src; do

 # Determine type of file the only right way.

 type=(`file "${src}" 2>/dev/null|awk -F':' '{print $2}'`)

 if [ -n "${type[*]}" -a "${type[0]}" = "PDF" ]; then

  echo -en "${progn}: src:\""${src//*\//}"\","

  echo -en "dst:"

  # Scrub schars

  dst="${src//[^$scrub]/_}"; dst=${dst//__/_}

  # Test, if not empty we prolly hit non-printables. Exit.

  if [ -n "${dst//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi

  echo -en "\"${dst//*\//}\""

  echo -en ",conv:"

  /usr/bin/convert "${src}" -quality 100 tif:"${dst}.tif"; subexitval

  echo -en ",rm:"

  rm "${src}"; subexitval; echo .

 fi

done

exit 0

As always YMMV(VM).

Elguapo

02-02-2007 11:28 AM

unSpawn while testing your script, I ran into one issue. When it scrubs off the illegal chars it scrubs off the - as well. The issue that this brings up is this. The directory I am using as param one looks like this. /mnt/phx-homesmart/paperless/tmp. After it is done processing, it gives me an err report.

If you look at the next to last line you can see that it changed it from /mnt/phx- to /mnt/phx_. I have been searching google to find a fix, but nothing yet. Is there something that I can do to fix this?

Btw, this script converts 100X's faster then just using convert alone.

Code:

sh test.sh /mnt/phx-homesmart/paperless/tmp

test: src:"testBarcode3.pdf",dst:"testBarcode3.pdf",conv:/usr/bin/convert: Unable to open file (/mnt/phx_homesmart/paperless/tmp/testBarcode3.pdf.tif) [No such file or directory].

ERR

unSpawn

02-02-2007 01:07 PM

Well, somebody just has to make the script split /the/path/ and filename so scrubbing only works on the filename.
[edit]
Here's a patch:

Code:

--- orig        1971-01-01 01:00:00.000000000 -0800

+++ patch      1971-01-01 01:00:01.000000000 -0800

@@ -18,13 +18,18 @@

  if [ -n "${type[*]}" -a "${type[0]}" = "PDF" ]; then

  echo -en "${progn}: src:\""${src//*\//}"\","

  echo -en "dst:"

+  # First split path and file component. Could use Bash but I'll settle for

+  # using dirname and basename.

+  dst_dir=`dirname "${src}" 2>/dev/null`

+  dst_fn=`basename "${src}" 2>/dev/null`

+  if [ "${}/${}" != "${src}" ]; then echo "failed."; exit 127; fi

  # Scrub schars

-  dst="${src//[^$scrub]/_}"; dst=${dst//__/_}

+  dst_fn="${dst_fn//[^$scrub]/_}"; dst_fn=${dst_fn//__/_}

  # Test, if not empty we prolly hit non-printables. Exit.

-  if [ -n "${dst//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi

-  echo -en "\"${dst//*\//}\""

+  if [ -n "${dst_fn//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi

+  echo -en "\"${dst_fn//*\//}\""

  echo -en ",conv:"

-  /usr/bin/convert "${src}" -quality 100 tif:"${dst}.tif"; subexitval

+  /usr/bin/convert "${src}" -quality 100 tif:"${dst_dir}/${dst_fn}.tif"; subexitval

  echo -en ",rm:"

  rm "${src}"; subexitval; echo .

  fi

All times are GMT -5. The time now is 08:06 PM.