Bash, input validation: request for comments

unSpawn · 07-25-2003, 02:36 PM

Lo peeps,

I'm trying to find an way to have minimal input string validation in Bash. Declaring static_char_restr should really be done in the shells central profile, I'm not worried about performance right now (though I did set max chars) and input should be fed escaped where necessary wrt "the usual suspects". Be aware I'm also trying to minimize usage of GNU utils, like using "index" instead of grep. As far as I can see the only problem "index" has is wrt parenthesis.

If you see room for improvement, I would be gratefull for your comments*.

[code]
function valStr() {
declare -r static_char_restr="1234567890-_./@#abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
declare -r str_len="256"
local str charPos charC; str=( $@ )
if [ -z "${str}" -o "${#str}" -gt "${str_len}" ]; then return 1; fi
let charC=${#str}-1; for charPos in $(seq 0 "$charC"); do
expr index "${str[0]:${charPos}:1}" " ${static_char_restr}" >/dev/null
case "$?" in 0) ;; 1|*) return 1;; esac; done; }
[code]

*To put it bluntly: those with a leetness complex or compulsive urge to post "use language x" oneliners and the like are respectfully requested to move on.

TheLinuxDuck · 07-25-2003, 04:18 PM

One qool feature of bash is being able to do pattern matching/replacement with regexps. This would allow you to consolidate the restrictionchars, and simply remove anything not allowed, and/or react based on what was removed, or simply because something was removed.

Code:

#!/bin/bash

# define good (allowed) characters
good_chars="a-zA-Z0-9"

# loop through CL stuff
for i in "$@"; do
  echo -n "input: $i: "
  # we basically strip out anything that is not in the good_chars list
  #  the // means to replace and nothing after it but before the
  # } means to replace with nothing.
  newstring=${i/[^$good_chars]//}
  # if any chars were stripped out, they won't match any longer
  if [ "$newstring" == "$i" ]; then
    echo "is good!"
  else
    echo "is bad"
  fi
done

TheLinuxDuck · 07-25-2003, 04:54 PM

And, which a minor change, you can check for maximum length, as well:

Code:

#!/bin/bash

good_chars="a-zA-Z0-9"
max_length=20

for i in "$@"; do
  echo -n "input: $i: "

  if [ ${#i} -gt $max_length ]; then
    echo "exceeds maximum length"
  else
    newstring=${i/[^$good_chars]//}
    if [ "$newstring" == "$i" ]; then
      echo "is good!"
    else
      echo "is bad"
    fi
  fi
done

Here is some sample usage:

Code:

~/shell/unspawn> ./regexp.sh fgdsag reggr egreg ere 35y 35y
e4t23 "&*$H(H$*H" "(" fdsfsd sfs sfsds fdfsdfdfsfsfsdfs
sdfsdfsdfsdfdsfsdfsdfsdfsdfsffdfsd fsdfsd
fsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsfsdfsdfsfsdf
sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs
input: fgdsag: is good!
input: reggr: is good!
input: egreg: is good!
input: ere: is good!
input: 35y: is good!
input: 35y: is good!
input: e4t23: is good!
input: &*(HH: is bad
input: (: is bad
input: fdsfsd: is good!
input: sfs: is good!
input: sfsds: is good!
input: fdfsdfdfsfsfsdfs: is good!
input: sdfsdfsdfsdfdsfsdfsdfsdfsdfsffdfsd: exceeds maximum length
input: fsdfsd: is good!
input: fsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsfsdfsdfsfsdf: exceeds maximum length
input: sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs: exceeds maximum length

---edit---

I just realized that you already have that in your script. Oh, well.. (=

unSpawn · 07-25-2003, 08:03 PM

Thanks TheLinuxDuck! You showed me an excellent way to work around the index vs parenthesis gizmoidal thingoid.

Code:

-local str charPos charC; str=( $@ )
+local str charPos charC char; str=( $@ )
 if [ -z "${str}" -o "${#str}" -gt "${str_len}" ]; then return 1; fi
 let charC=${#str}-1; for charPos in $(seq 0 "$charC"); do
-expr index "${str[0]:${charPos}:1}" " ${static_char_restr}" >/dev/null
+char="${str[0]:${charPos}:1}"
+expr index "${char/[\(\)]/!}" "${static_char_restr} " >/dev/null
 case "$?" in 0) ;; 1|*) return 1;; esac; done; }

If you leave the "expensive" char by char parsing vs string comparison approach there, there are small but notable differences:

Code:

~/shell/duckie> ./regexp.sh "" ; echo $?
input: : is good!
0

Code:

[unsp tmp]$ . f_valStr
[unsp tmp]$ valStr ""; echo $?
1

Ok, ok, we should prolly note both examples where not optimized against this kind of abuse, and using "openssl rand 100000" instead of perl would be too much fun:

Code:

~/shell/duckie> ./regexp.sh "`perl -e 'print "0" x 100000'`" ; echo $?
0000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000:
(kept goin, CTRL+C after 4m7.550s)
130

Code:

[unsp tmp]$ . f_valStr
[unsp tmp]$ time valStr "`perl -e 'print "0" x 100000'`" ; echo $?
real    0m0.146s
user    0m0.140s
sys     0m0.000s
1

I sincerely hope you don't see this as me slagging off your codes, I just wanted to share that... If you or anyone else got more stuff that might help improve this, BMG!