LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-31-2007, 12:13 PM   #1
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Rep: Reputation: 15
need a little bash script help


I am working on a shell script that reads a directory that is passed as an argument into the shell script. After it reads the directory, I need to check to see if the files in the directory are of the file type pdf. If file is a pdf, I need to convert the pdf to a tiff file via image magick. I have all of my logic figured out, but can't figure out the syntax of the entire bash script.

Logic.

List Directory in argument 1

loop directory listing

if file == pdf

convert using image magick

delete parent file

It is pretty simple logic and does exactly what I need it to do. However, as I said before I am stuck on the syntax of the script.

So far this is what I have.

#!/bin/bash
# covert pdf to tiff
#

directory=/home/myfolder

for file in $( find $directory -type F -name '*' | sort)
do
if file


When running the command, I am trying to make it so that I can pass teh directory var into the script. So it would look something like this if I were to run it from a command line.

pdfToTiff.sh -d /home/myfolder

Can someone point me in the right direction?
 
Old 01-31-2007, 12:51 PM   #2
gctaylor1
Member
 
Registered: Dec 2006
Distribution: Red Hat
Posts: 45

Rep: Reputation: 0
The concept you're looking for is called positional parameters.

Here's a real simple example:
Code:
#!/bin/bash
echo "this is first -" $1
echo "this is second -" $2
echo "this is 3rd - " $3
echo "this is 4th - " $4
When run it looks like this:
Code:
$ ./position.sh a b c d
this is first - a
this is second - b
this is 3rd -  c
this is 4th -  d
 
Old 01-31-2007, 01:22 PM   #3
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Original Poster
Rep: Reputation: 15
I think I have the bash script mostly fleshed out. This is what I have so far.

directory=$1

for file in $( find $directory -type f -name '*' | sort)
do
if echo "$file" | grep -q '.pdf$';
then
/usr/bin/convert $file -quality 100 tif:$file.tif
rm $file
fi
done

The one problem that i can not figure out is this. when I am doing converting the file to a tiff file, I end up with a file that looks like this. testDoc1.pdf.tif when it should just be testDoc1.tif. I haven't been able to find a way to remove the extension from the output file. Does anyone know how I can do this?
 
Old 01-31-2007, 01:32 PM   #4
b0uncer
LQ Guru
 
Registered: Aug 2003
Distribution: CentOS, OS X
Posts: 5,131

Rep: Reputation: Disabled
A quick thought: you use find to find all files, then test each one if it ends with .pdf. Instead you could just use find to list only files that end with pdf (*.pdf) and then go trough them all; this way your list doesn't include those non-.pdf-files and the loop doesn't roll trough that many times (unless every file is a .pdf file). Another option would be to get all files listed like you do, and then use file command to determine if it's a PDF file (that way also those .pdf files get recognized and worked with that -- for some unknown reason -- do not end with .pdf altough they are pdfs). This may sound like an odd thing, but something could have accidentally renamed pdf files so they don't end with .pdf (like .pdf.2). I've seen that some KDE versions where you save a file and the name part of the file is painted for you (so you can just write a new name, and the suffix wouldn't change), a bug caused the name to be written partly after the suffix, and if you do that very quickly, the filename becomes something odd The main reason, however, why I recommend using file is that you can do it, and because on UNIX file suffixes don't matter as much as on Windows; you can name your pdf files any way you like. It just makes your script smarter.

You sure can use sed or awk or possibly something even more simpler to process the filename; check
Code:
man sed
or if that doesn't help,
Code:
man awk
Sed might be easier to use, awk is pretty much larger thing.
 
Old 01-31-2007, 02:52 PM   #5
Franklin52
LQ Newbie
 
Registered: Nov 2006
Posts: 6

Rep: Reputation: 0
Try this:

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif
Regards
 
Old 01-31-2007, 05:30 PM   #6
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Original Poster
Rep: Reputation: 15
Using the code below worked like a charm. It no longer adds the .pdf extension to the file name.

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif
However I do have one final question before I get off this topic. During my tests with the script, I noticed that files with spaces or irregular chars in the file name, cause the script to error out. Is there a way that I can reformat the name of the file if it contains irregular chars.

Here is the code I am currently using.

Code:
directory=$1

for file in $( find $directory -type f -name '*' | sort)
	do
		if echo "$file" | grep -q '.pdf$'; 
			then
				/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif
				rm $file
		fi
	done
 
Old 01-31-2007, 09:16 PM   #7
Ynot Irucrem
Member
 
Registered: Apr 2005
Location: Perth, Western Australia
Distribution: Debian
Posts: 233

Rep: Reputation: 30
Enclose your filenames in quotes.
$file becomes "$file". not so sure about that funky one, but I would try "tif:${file%.*f}.tif" or tif:"${file%.*f}.tif"

Last edited by Ynot Irucrem; 01-31-2007 at 09:20 PM.
 
Old 02-01-2007, 01:11 PM   #8
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Original Poster
Rep: Reputation: 15
I should have been more specific. Lets same someone sends me the pdf This Is a PDf.pdf. This would work find on a windows file system. But it does not work fine in a linux file system. When I try to parse the document with image magick I get the error unable to find file PDf.pdf.

So my question is this, how do I remove all the illegal characters from a file name if they exist. I have to make this a very broad array of illegal chars simply because of the large scope of people passing me documents.
 
Old 02-01-2007, 03:33 PM   #9
Franklin52
LQ Newbie
 
Registered: Nov 2006
Posts: 6

Rep: Reputation: 0
You can remove the spaces with:

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ //g'
To replace the spaces with an underscore:

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ /_/g'
Regards
 
Old 02-01-2007, 04:56 PM   #10
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by Franklin52
You can remove the spaces with:

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ //g'
To replace the spaces with an underscore:

Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ /_/g'
Regards
This would work in theory except that when I try to pass a file with the file name like Main Office network v2.pdf into the bash script it is going to error out.

From everything that I have tested, what I need to do is this. I need to take one directory listing, and loop over it. If any of the files have any of below chars in them I need to remove it all together. The first loop I need to do is where I am having the issues.

Code:
'''!"#$%&\'()*+,/:;<=>?@[\\]^_`{|}~ '''
 
Old 02-02-2007, 01:35 AM   #11
Franklin52
LQ Newbie
 
Registered: Nov 2006
Posts: 6

Rep: Reputation: 0
This should remove all non printable characters:

Code:
tr -cd 'a-zA-Z0-9.'
Regards

Last edited by Franklin52; 02-02-2007 at 11:38 AM.
 
Old 02-02-2007, 06:10 AM   #12
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Here's a wee rework of your script with explanation trying to use a maximum of Bash and a minimum of external utilities. If you want to see how it works uncomment the "set -x" line.

Code:
#!/bin/sh
#set -x
# Use whitelist instead.
declare -r scrub="1234567890-_./abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
# Determine my name for reporting.
progn=${0//*\//}; progn=${progn%.sh}
# Reporting subprocess exit status.
subexitval() { case "$?" in 0) f="-en"; r=OK;; *) unset f; r=ERR;; esac; 
 echo $f "$r"; if [ -z "$f" ]; then echo .; exit 1; fi; }
# Minimal check on supplied args.
case "$1" in h|-h|--help) echo "${progn}: help."; exit 1;;
*) if [ ! -d "$1" ]; then echo "${progn}: NODIR, exiting"; exit 1; fi;;
esac; directory="$1"
# "while" loop is "better".
find $directory -type f|while read src; do
 # Determine type of file the only right way.
 type=(`file "${src}" 2>/dev/null|awk -F':' '{print $2}'`)
 if [ -n "${type[*]}" -a "${type[0]}" = "PDF" ]; then
  echo -en "${progn}: src:\""${src//*\//}"\","
  echo -en "dst:"
  # Scrub schars
  dst="${src//[^$scrub]/_}"; dst=${dst//__/_}
  # Test, if not empty we prolly hit non-printables. Exit.
  if [ -n "${dst//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi
  echo -en "\"${dst//*\//}\""
  echo -en ",conv:"
  /usr/bin/convert "${src}" -quality 100 tif:"${dst}.tif"; subexitval
  echo -en ",rm:"
  rm "${src}"; subexitval; echo .
 fi
done
exit 0
As always YMMV(VM).
 
Old 02-02-2007, 11:28 AM   #13
Elguapo
Member
 
Registered: Mar 2005
Distribution: FC7
Posts: 42

Original Poster
Rep: Reputation: 15
unSpawn while testing your script, I ran into one issue. When it scrubs off the illegal chars it scrubs off the - as well. The issue that this brings up is this. The directory I am using as param one looks like this. /mnt/phx-homesmart/paperless/tmp. After it is done processing, it gives me an err report.

If you look at the next to last line you can see that it changed it from /mnt/phx- to /mnt/phx_. I have been searching google to find a fix, but nothing yet. Is there something that I can do to fix this?

Btw, this script converts 100X's faster then just using convert alone.

Code:
sh test.sh /mnt/phx-homesmart/paperless/tmp
test: src:"testBarcode3.pdf",dst:"testBarcode3.pdf",conv:/usr/bin/convert: Unable to open file (/mnt/phx_homesmart/paperless/tmp/testBarcode3.pdf.tif) [No such file or directory].
ERR
 
Old 02-02-2007, 01:07 PM   #14
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Well, somebody just has to make the script split /the/path/ and filename so scrubbing only works on the filename.
[edit]
Here's a patch:
Code:
--- orig        1971-01-01 01:00:00.000000000 -0800
+++ patch       1971-01-01 01:00:01.000000000 -0800
@@ -18,13 +18,18 @@
  if [ -n "${type[*]}" -a "${type[0]}" = "PDF" ]; then
   echo -en "${progn}: src:\""${src//*\//}"\","
   echo -en "dst:"
+  # First split path and file component. Could use Bash but I'll settle for
+  # using dirname and basename.
+  dst_dir=`dirname "${src}" 2>/dev/null`
+  dst_fn=`basename "${src}" 2>/dev/null`
+  if [ "${}/${}" != "${src}" ]; then echo "failed."; exit 127; fi
   # Scrub schars
-  dst="${src//[^$scrub]/_}"; dst=${dst//__/_}
+  dst_fn="${dst_fn//[^$scrub]/_}"; dst_fn=${dst_fn//__/_}
   # Test, if not empty we prolly hit non-printables. Exit.
-  if [ -n "${dst//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi
-  echo -en "\"${dst//*\//}\""
+  if [ -n "${dst_fn//[${scrub}_]/}" ]; then echo "FATAL,exit."; exit 127; fi
+  echo -en "\"${dst_fn//*\//}\""
   echo -en ",conv:"
-  /usr/bin/convert "${src}" -quality 100 tif:"${dst}.tif"; subexitval
+  /usr/bin/convert "${src}" -quality 100 tif:"${dst_dir}/${dst_fn}.tif"; subexitval
   echo -en ",rm:"
   rm "${src}"; subexitval; echo .
  fi

Last edited by unSpawn; 02-04-2007 at 07:59 AM. Reason: //add patch.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script hangs upon starting another script in the background masea2 Linux - Software 4 11-13-2006 05:18 AM
building a bash script from an install script paranoid times Programming 6 07-29-2006 03:24 AM
Bash script - executing a script through subdirectories bubkus_jones Programming 5 04-24-2006 05:05 PM
send automatic input to a script called by another script in bash programming jorgecab Programming 2 04-01-2004 12:20 AM
bash script prob: how can i tell the script that a 'dd' has finished? Frustin Linux - General 2 04-02-2003 05:34 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration