LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Bash Shell Script to check if a string has only alphabets and digits. (https://www.linuxquestions.org/questions/programming-9/bash-shell-script-to-check-if-a-string-has-only-alphabets-and-digits-4175422403/)

aswani 08-16-2012 08:17 AM

Bash Shell Script to check if a string has only alphabets and digits.
 
Hello,

I am trying to write a simple shell script. Here I want to pass a string to the script and want the script to check if the string has only alphabets and digits.

Code:

#!/bin/bash

# needs to be debugged

Rand_String=$1;

if  test $Rand_String = "[0-9A-Za-z]*"

        then echo "string $Rand_String has alphabets which are only alpha numeric";

        else echo "string $Rand_String has characters which are not alphanumeric";

fi

this script always says that the string has characters which are not alpha numeric. Please check where the error is.

best wishes,
Aswani

millgates 08-16-2012 08:42 AM

you are comparing $Rand_String to the literal string "[0-9A-Za-z]*". If you want that to be interpreted as a pattern, you need to remove the double quotes. Also, bash globs are not regexp (at least not by default) so your pattern does not mean "an arbitrary number of alphanumeric characters", but "an alphanumeric character followed by arbitrary number of any characters." One way to do what you want might be something like this:

Code:

[[ "${Rand_String//[0-9A-Za-z]/}" = "" ]] && echo "Only contains alphanumeric" || echo "Contains other characters"

cgmertens 08-16-2012 08:44 AM

#!/bin/bash

# This is one solution:
# 1) don't use test, it has very limited capabilities
# 2) added quoting to allow spaces in input string
# 3) use regular expression operator =~
# 4) use ^ "not in set" regexp operator
# 5) use + for one or more matches rather than * for zero or more

Rand_String="$1"

if [[ "$Rand_String" =~ [^0-9A-Za-z]+ ]] ; then
echo "string $Rand_String has characters which are not alphanumeric"
else
echo "string $Rand_String has alphabets which are only alpha numeric"
fi

David the H. 08-16-2012 09:06 AM

How about this instead?

Code:

case ${string//[[:alnum:]]} in

        "") echo "all clean" ;;
        *) echo "not clean" ;;

 esac

It strips out all alphanumeric characters, so if there's anything remaining, then it doesn't pass the test. And case statement globbing matches are generally more efficient than if statements, especially when using regexes.

Edit: a variation on the theme, perhaps easier to read:

Code:

case $string in

        *[^[:alnum:]]*) echo "not clean" ;;
                    *) echo "all clean" ;;

 esac


sundialsvcs 08-16-2012 09:18 AM

Strictly speaking, the pattern needs to also include the beginning-of-string and end-of-string anchors as well. The pattern must be: "beginning-of-string, followed by zero-or-more occurrences of, say, :alnum:, followed by end-of-string." You might or might not need to use egrep to specify this "extended" regular expression.

One of the best things you can do is to spend a lot of time patiently studying the arcane art of regular-expressions ... which are actually "easy to understand but dammed hard to read." They are one of the great power-tools of the programming craft. Although the "string of gibberish" presentation of a regex is fairly confrontational, and the facility is full of (frankly, too many) arcane options, the concept is a simple one, and there are enormous very-practical applications for it. The knowledge will serve you very, very well. Oh yes, in the Windows world too.

aswani 08-16-2012 09:33 AM

Quote:

Originally Posted by millgates (Post 4755605)
you are comparing $Rand_String to the literal string "[0-9A-Za-z]*". If you want that to be interpreted as a pattern, you need to remove the double quotes. Also, bash globs are not regexp (at least not by default) so your pattern does not mean "an arbitrary number of alphanumeric characters", but "an alphanumeric character followed by arbitrary number of any characters." One way to do what you want might be something like this:

Code:

[[ "${Rand_String//[0-9A-Za-z]/}" = "" ]] && echo "Only contains alphanumeric" || echo "Contains other characters"

Thanks for your reply. The code that you suggested works well.

However please note that when I remove the double quotes in my original script as mentioned by you, the script gives an error saying

Code:

[aswani@maruthi]$ test_string.sh abcd
test_string.sh: line 7: test: too many arguments


David the H. 08-16-2012 09:46 AM

Quote:

Originally Posted by sundialsvcs (Post 4755632)
Strictly speaking, the pattern needs to also include the beginning-of-string and end-of-string anchors as well. The pattern must be: "beginning-of-string, followed by zero-or-more occurrences of, say, :alnum:, followed by end-of-string." You might or might not need to use egrep to specify this "extended" regular expression.


That's not really correct here. All regex searches start implicitly from the beginning of the line. And the only thing we're interested in is whether there's at least one non-alphanumeric character as it walks towards the end in its search. So we don't even really need a repeater character here. All the "[]" bracket expression has to do is find a single a matching character and the regex returns true, otherwise false.

Globbing, BTW, is implicitly anchored at both ends, so the expression must take into account the whole string.

Code:

[[ $string =~ [^[:alnum:]] ]] && echo "not clean" || echo "clean"

[[ $string == *[^[:alnum:]]* ]] && echo "not clean" || echo "clean"

echo "$string" | grep -q '[^[:alnum:]]' && echo "not clean" || echo "clean"

Edit: incidentally, when using regex in bash's [[..]] test, properly escaping shell-reserved characters can be a problem. The best way to handle it is to store the regex in a separate variable first.

Code:

re='[^[:alnum:]]'
[[ $string =~ $re ]] && echo "not clean" || echo "clean"


aswani 08-16-2012 09:46 AM

Quote:

Originally Posted by cgmertens (Post 4755607)
#!/bin/bash

# This is one solution:
# 1) don't use test, it has very limited capabilities
# 2) added quoting to allow spaces in input string
# 3) use regular expression operator =~
# 4) use ^ "not in set" regexp operator
# 5) use + for one or more matches rather than * for zero or more

Rand_String="$1"

if [[ "$Rand_String" =~ [^0-9A-Za-z]+ ]] ; then
echo "string $Rand_String has characters which are not alphanumeric"
else
echo "string $Rand_String has alphabets which are only alpha numeric"
fi

thanks for the answer, the modified script works well.

aswani 08-16-2012 09:49 AM

Quote:

Originally Posted by David the H. (Post 4755624)
How about this instead?

Code:

case ${string//[[:alnum:]]} in

        "") echo "all clean" ;;
        *) echo "not clean" ;;

 esac

It strips out all alphanumeric characters, so if there's anything remaining, then it doesn't pass the test. And case statement globbing matches are generally more efficient than if statements, especially when using regexes.

Edit: a variation on the theme, perhaps easier to read:

Code:

case $string in

        *[^[:alnum:]]*) echo "not clean" ;;
                    *) echo "all clean" ;;

 esac


Thanks for the answer, your explanation helped me to understand what is happening within your first script.


All times are GMT -5. The time now is 01:08 AM.