Bash Programming

Skyrius · 04-13-2012, 09:24 PM

So, first post then. I've never actually used linux for programming anything, and I'm much more used to C, but we had a homework assignment that wanted us to get use to bash scripting. Our script will be passed a single input, the directory path that we are to check. We need to compare all the regular files in that directory and if any of them are duplicates, replace them with hard links, keeping the file that is lexigraphically first (A before B, B before C, etc). Other than the fact we have to use cmp to compare to two files and ln to create the link, we're pretty much given free reign. This is what I've sort of got right now:

Code:

!/bin/bash

## Creating a sorted array containing all the regular files in the directory we were passed
N=0
for i in $(find $1 -maxdepth 1 -type f | sort -u); do
    ARRAY[$N]="$i"
let "N= $N + 1"
done

##compare each item in the array with the rest, replacing duplicates with hard links
total=${#ARRAY[@]}
for((i=0; i<=$(( $total-1)); i++))
do
    FIRST=${ARRAY[$i]}
    for((j=$i+1; j<=$(($total-2)); j++))
    do
        SECOND=${ARRAY[$j]}
        COMP=(cmp -s $FIRST $SECOND)
        if [ $COMP -eq 0 ]; then
        ln -f $FIRST $SECOND
        elif [ $COMP -eq >1 ]; then
        echo "Error reading file" 
        fi
    done
done

I think I got the syntax wrong on a few things, but my main question is that, for the if statement there, am I allowed to pass the result of a command call? I know you can do things like compare variables, and the cmp command DOES return an exit status of 0 or 1, but am I actually allowed to do that?

Basically, I have a sorted array, so I don't have to worry about the lexigraphical ordering.

Also, there's a warning that our script should be prepared to handle cases where the file names contains special characters such as spaces, "*", or leading "-" and such. However, just on the command line itself, I haven't actually run into an issue. Is there something about those that would possibly cause trouble if we run the script?

EDIT: It has occurred to me that this might have been better off in the Programming section, but I really am more or less a newbie at all of this. It confuses the heck out of me in most cases XD;;

towheedm · 04-13-2012, 11:08 PM

To begin with, and as have been stated on several posts, please use code tags for your code. It preserves formatting and makes it must easier to read. That said, your very first mistake is:

Incorrect:

Code:

!/bin/bash

Correct:

Code:

#!/bin/bash

Quote:

am I allowed to pass the result of a command call?

You can pass the result of any command to a var using command substitution $(command):

Code:

x="$(echo "Hello") world"
echo $x
Hello world

The BASH variable "$?" always holds the exit status of the last command:

Code:

file1=/some/file
file2=/some/other/file
cmp "$file1" "$file2"
ret_val="$?"    # ret_val now holds the exit status of the cmp command
if [ "$ret_val" = "0" ]; then
  echo "Files are the same"
elif [ "$ret_val = "1" ]; then
  echo "Files are different"
else
  echo "Error reading files"
fi

Quote:

Also, there's a warning that our script should be prepared to handle cases where the file names contains special characters such as spaces, "*", or leading "-" and such

All BASH meta-characters appearing in file names must be escaped by preceding the meta-character with the escape character "\":

If the files name is 'this file has spaces in the name':

Code:

cat this\ file\ has\ spaces\ in\ the\ name

There are many good tutorials on BASH scripting. Here's one to start you off:
http://www.gnu.org/software/bash/manual/bashref.html

Skyrius · 04-13-2012, 11:29 PM

Ah, sorry about the tags. I've put those in now (that is a lot neater XD); And thank you for the help. I think I've got it working. And since you've explained, I have question about the escape clause too then. If I wanted to apply the cmp command to a file that had spaces or special characters in them, if I've stored that file name into an array, I'm assuming the escape characters aren't already written in for me then? So when I grab that item from the array, I have to check for special characters and insert a \ before each character manually, correct?

towheedm · 04-13-2012, 11:36 PM

Quote:

I have to check for special characters and insert a \ before each character manually, correct

Yes, as far as I know that must be done.

Skyrius · 04-13-2012, 11:49 PM

Alright thanks, last question. This one's really silly and I FEEL silly for not being able to get it to do what I want.

For the elif statment, I want it the code to simply do nothing. I know in C, you can just put a ;. I tried leaving it blank, but that doesn't seem to work. Then I tried echo "" to see if I could just get it to echo nothing, but unfortunately that creates a newline after every call to echo. I need to keep the case in, since if I just do else by itself, it will write out even in the cases I don't want (since I have three cases, 0, 1, or >1).

yoK0 · 04-14-2012, 12:00 AM

Maybe wait 1 will do the work

towheedm · 04-14-2012, 12:07 AM

Well since I'm here:
The BASH built-in command:

Code:

does nothing except for re-directions and the sort. It always return true.

Code:

x="Hello"
echo $x
Hello
x="World"
echo $x
World
x=:
echo $x
World
:  # By itself do nothing and return and exit status of true

Skyrius · 04-14-2012, 12:08 AM

Darn, no. It gives me an error saying wait: pid 1 is not a child of this shell. I have to use the bash shell for the assignment, but I don't think that's it. I'm not quite sure about this error.

EDIT: Ah, I figured it out. It's "then :". That's odd syntax. I'm just not used to script language I guess XD

Ninja'd. But thank you so much for your help. I've finally got it more or less working. The special characters in the file name I can get rid of with sed, but the problem is if I call sed on the file I stored in the array, it edits the stuff inside the file and not the file name. I guess I have to store the name as a string variable and call sed on that variable first. But I've already isolated all the cases where I'll need to edit things with an escape clause, so I just need to figure out how to apply the regex to the file name and not the file contents.

towheedm · 04-14-2012, 12:25 AM

Quote:

But I've already isolated all the cases where I'll need to edit things with an escape clause, so I just need to figure out how to apply the regex to the file name and not the file contents.

Use a pipe (|). BASH allows you to use the output of one command as the input of the next command.

Code:

echo "Hello World" | sed -n 's/World/Earthlings/p'
Hello Earthlings

cbtshare · 04-14-2012, 02:26 AM

your let is also wrong, never seen it worked that way.

Try:
let "N += 1"

towheedm · 04-14-2012, 11:34 AM

Code:

let "N= $N + 1"

This is actually one form of arithmetic operation accepted by BASH, although I would remove the spaces.

Code:

x=1
let "x=$x+1"
echo $x
2

I'm not sure how portable this is though.

catkin · 04-14-2012, 01:03 PM

A few observations ...

AFAIK the output from find is already sorted so no need to pipe its output into sort. Here's a test.

Code:

c@CW8:~$ find /usr/bin -maxdepth 1 -type f > /tmp/trash
c@CW8:~$ find /usr/bin -maxdepth 1 -type f | sort > /tmp/trash2
c@CW8:~$ diff /tmp/trash /tmp/trash2
[no output]

The for i in $(find $1 -maxdepth 1 -type f | sort -u); do breaks on file names including spaces (and tabs and newlines):

Code:

c@CW8:/tmp/tmp$ touch a 'a b'
c@CW8:/tmp/tmp$ for f in $( find -type f ); do echo ">$f<"; done
>./a<
>./a<
>b<

The robust solution is to use something like

Code:

while IFS='' read -r -d '' file
do
   files+=("$file")
done < <(find $dir -type f -print0)

Notes:

The -print0 makes find output file names separated by ASCII NUL characters rather than newlines.
The -d '' sets read's record separator to the empty string but bash is written in C which terminates strings with ASCII NUL so that's what read actually uses as the record separator (I think; anyway, it works).
The -r puts read in raw mode so it deos not interpret and "backslash escapes". For example any \t in the file name is kept as that, not translated to a tab.
The IFS='' sets bash's field separator to the empty string. By default it is space tab newline so would strip any of those characters from the front and back of the file name. EDIT: more properly the "path name".
files+=("$file") adds the file name to the array files. No need to use and increment an array index. The maximum array index is available as ${#array[ * ]} leading to the standard idiom for iterating over an array:
Code:
```
for (( i=0; i<${#array[ * ]}; i++ ))
do
   # Do something with ${array[i]}
done
```

Note: ${#array[ * ]} is used above to prevent the list code seeing[*] as introducing the next list item! It can more tidily be ${#array[*]}.

cmp is "expensive". If you have a lot of files, better to test progressively for identicality starting with the cheapest test first. stat --printf %s $filename would be a good choice for the first test. md5sum $filename might be a better next/last test than cmp $filename1 $filename2. Both stat and md5sum would have to be run on both files so take a bit more coding than cmp so you may prefer to stay with cmp if you don't have many files. If you do decide on the others the identicality test could be something like if [[ $(stat --printf %s "$filename1") -eq $(stat --printf %s "$filename2") ]]; then

In ARRAY[$N]="$i", the double quotes and the $ in front of N have no effect. Bash does not do "word splitting" on the expression to the right of an assignment =. Reference here. The expression inside an array index [ ] is an arithmetic expression. Bash substitutes the value of any variables named within an arithmetic expression without the need for the $ ("the value of") operator.

Similarly the string to the right of a let statement is an arithmetic expression. let is the original statement used to evaluate arithmetic expressions and is intuitively obvious. The later equivalent is (( <arithmetic expression> )) which is less obvious but has the advantage over let that it can be used as a test:

Code:

if (( <arithmetic expression> )); then

and can be substituted by its value using the $ operator: $(( <arithmetic expression> )).

Not only is the $ operator unnecessary in arithmetic expressions, whitespace either side of operators is optional so i=a+b can also be i = a + b according to taste.

EDIT: that is not true for arithmetic expressions used with a let. They must either have no space around the operators (let i=a+b or be in a string (let 'i = a + b').

This code ...

Code:

cmp "$file1" "$file2"
ret_val="$?"    # ret_val now holds the exit status of the cmp command
if [ "$ret_val" = "0" ]; then
  echo "Files are the same"
elif [ "$ret_val = "1" ]; then
  echo "Files are different"
else
  echo "Error reading files"
fi

... can be simplified to avoid the need for retval by

Code:

cmp "$file1" "$file2"
case $? in
    0 )
        echo "Files are the same"
        ;;
    1 ) 
        echo "Files are different"
        ;;
    * )
        echo "Error reading files"
esac

Quote:

If I wanted to apply the cmp command to a file that had spaces or special characters in them, if I've stored that file name into an array, I'm assuming the escape characters aren't already written in for me then? So when I grab that item from the array, I have to check for special characters and insert a \ before each character manually, correct?

No. As long as you use double quotes around the variables, their values will be passed verbatim to cmp: COMP=(cmp -s "$FIRST" "$SECOND")

Quote:

For the elif statment, I want it the code to simply do nothing. I know in C, you can just put a ;. I tried leaving it blank, but that doesn't seem to work. Then I tried echo "" to see if I could just get it to echo nothing, but unfortunately that creates a newline after every call to echo. I need to keep the case in, since if I just do else by itself, it will write out even in the cases I don't want (since I have three cases, 0, 1, or >1)

: can be used but a case statement is a more elegant solution:

Code:

case $COMP in
    0 )
        ln -f $FIRST $SECOND
        ;;
    1 )
        ;;
    * )
        echo "Error reading file"
esac

rknichols · 04-15-2012, 11:41 AM

Quote:

Originally Posted by catkin

AFAIK the output from find is already sorted so no need to pipe its output into sort.

No, find just returns names in the order in which it encounters them in the directory. Unless the file names were placed in the directory in their natural collating sequence, or you happen to be using some file system that keeps its directories sorted, the output will not be sorted. Here is a result from an ext3 file system:

Code:

$ find /usr/bin -maxdepth 1 -type f | head
/usr/bin/grmiregistry
/usr/bin/[
/usr/bin/ciptool
/usr/bin/gstack
/usr/bin/unprotoize
/usr/bin/pamtopam
/usr/bin/ldns-chaos
/usr/bin/upssched-cmd
/usr/bin/aconnect
/usr/bin/cupstestppd
$ find /usr/bin -maxdepth 1 -type f | sort | head
/usr/bin/.fipscheck.hmac
/usr/bin/.ssh.hmac
/usr/bin/411toppm
/usr/bin/FBReader
/usr/bin/GET
/usr/bin/HEAD
/usr/bin/POST
/usr/bin/RSA_SecurID_getpasswd
/usr/bin/Xdialog
/usr/bin/Xdialog-gtk1

chrism01 · 04-15-2012, 07:08 PM

Just a minor point, but numeric comparisons use operators '-eq', '-gt', '-ge' etc
Symbolic operators eg '=', '>' are for string comparisons
http://tldp.org/LDP/abs/html/comparison-ops.html

In general, you'll find these links useful
http://rute.2038bug.com/index.html.gz
http://tldp.org/LDP/Bash-Beginners-G...tml/index.html
http://www.tldp.org/LDP/abs/html/

I'd also recommend [[ ]] over [ ] http://tldp.org/LDP/abs/html/testcon...ml#DBLBRACKETS

jschiwal · 04-15-2012, 07:16 PM

Comparing every file against the others in a loop sounds inefficient. Consider creating a list of md5sums; sorting the list; and using uniq to locate duplicates.