need a little bash script help
I am working on a shell script that reads a directory that is passed as an argument into the shell script. After it reads the directory, I need to check to see if the files in the directory are of the file type pdf. If file is a pdf, I need to convert the pdf to a tiff file via image magick. I have all of my logic figured out, but can't figure out the syntax of the entire bash script.
Logic. List Directory in argument 1 loop directory listing if file == pdf convert using image magick delete parent file It is pretty simple logic and does exactly what I need it to do. However, as I said before I am stuck on the syntax of the script. So far this is what I have. #!/bin/bash # covert pdf to tiff # directory=/home/myfolder for file in $( find $directory -type F -name '*' | sort) do if file When running the command, I am trying to make it so that I can pass teh directory var into the script. So it would look something like this if I were to run it from a command line. pdfToTiff.sh -d /home/myfolder Can someone point me in the right direction? |
The concept you're looking for is called positional parameters.
Here's a real simple example: Code:
#!/bin/bash Code:
$ ./position.sh a b c d |
I think I have the bash script mostly fleshed out. This is what I have so far.
directory=$1 for file in $( find $directory -type f -name '*' | sort) do if echo "$file" | grep -q '.pdf$'; then /usr/bin/convert $file -quality 100 tif:$file.tif rm $file fi done The one problem that i can not figure out is this. when I am doing converting the file to a tiff file, I end up with a file that looks like this. testDoc1.pdf.tif when it should just be testDoc1.tif. I haven't been able to find a way to remove the extension from the output file. Does anyone know how I can do this? |
A quick thought: you use find to find all files, then test each one if it ends with .pdf. Instead you could just use find to list only files that end with pdf (*.pdf) and then go trough them all; this way your list doesn't include those non-.pdf-files and the loop doesn't roll trough that many times (unless every file is a .pdf file). Another option would be to get all files listed like you do, and then use file command to determine if it's a PDF file (that way also those .pdf files get recognized and worked with that -- for some unknown reason -- do not end with .pdf altough they are pdfs). This may sound like an odd thing, but something could have accidentally renamed pdf files so they don't end with .pdf (like .pdf.2). I've seen that some KDE versions where you save a file and the name part of the file is painted for you (so you can just write a new name, and the suffix wouldn't change), a bug caused the name to be written partly after the suffix, and if you do that very quickly, the filename becomes something odd :) The main reason, however, why I recommend using file is that you can do it, and because on UNIX file suffixes don't matter as much as on Windows; you can name your pdf files any way you like. It just makes your script smarter.
You sure can use sed or awk or possibly something even more simpler to process the filename; check Code:
man sed Code:
man awk |
Try this:
Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif |
Using the code below worked like a charm. It no longer adds the .pdf extension to the file name.
Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif Here is the code I am currently using. Code:
directory=$1 |
Enclose your filenames in quotes.
$file becomes "$file". not so sure about that funky one, but I would try "tif:${file%.*f}.tif" or tif:"${file%.*f}.tif" |
I should have been more specific. Lets same someone sends me the pdf This Is a PDf.pdf. This would work find on a windows file system. But it does not work fine in a linux file system. When I try to parse the document with image magick I get the error unable to find file PDf.pdf.
So my question is this, how do I remove all the illegal characters from a file name if they exist. I have to make this a very broad array of illegal chars simply because of the large scope of people passing me documents. |
You can remove the spaces with:
Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ //g' Code:
/usr/bin/convert $file -quality 100 tif:${file%.*f}.tif|sed 's/ /_/g' |
Quote:
From everything that I have tested, what I need to do is this. I need to take one directory listing, and loop over it. If any of the files have any of below chars in them I need to remove it all together. The first loop I need to do is where I am having the issues. Code:
'''!"#$%&\'()*+,/:;<=>?@[\\]^_`{|}~ ''' |
This should remove all non printable characters:
Code:
tr -cd 'a-zA-Z0-9.' |
Here's a wee rework of your script with explanation trying to use a maximum of Bash and a minimum of external utilities. If you want to see how it works uncomment the "set -x" line.
Code:
#!/bin/sh |
unSpawn while testing your script, I ran into one issue. When it scrubs off the illegal chars it scrubs off the - as well. The issue that this brings up is this. The directory I am using as param one looks like this. /mnt/phx-homesmart/paperless/tmp. After it is done processing, it gives me an err report.
If you look at the next to last line you can see that it changed it from /mnt/phx- to /mnt/phx_. I have been searching google to find a fix, but nothing yet. Is there something that I can do to fix this? Btw, this script converts 100X's faster then just using convert alone. Code:
sh test.sh /mnt/phx-homesmart/paperless/tmp |
Well, somebody just has to make the script split /the/path/ and filename so scrubbing only works on the filename.
[edit] Here's a patch: Code:
--- orig 1971-01-01 01:00:00.000000000 -0800 |
All times are GMT -5. The time now is 08:06 PM. |