LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   [bash] code review request (https://www.linuxquestions.org/questions/programming-9/%5Bbash%5D-code-review-request-4175718051/)

ychaouche 10-23-2022 05:50 AM

[bash] code review request
 
Hello all,

I have just written my first big function in bash and I'm not happy with it b/c of the use of getopt.
I chose GNU getopt overs bash getopts builtin because it can understand and parse --long-options,
however, it dosn't know how to parse arguments that include spaces, even when you quote them.

The second thing I'm not happy with is the function definition of _usage inside the main function.
I learned that function definitions are global,
so it's no use to define a function inside a function with the intention of hiding it to others.
I don't know how to do that in bash,
or if I should even bother with it.

Here's the code, if anyone wants to discuss it.

Code:

function notes.search {
    # examples
    # notes.search --log --newer-than 30 -E "(streaming.*TDA|TDA.*streaming)"
    # notes.search --txt -F --headings "nextcloud"
    # notes.search --all -F "streaming"
    # notes.search --all --newer-than 30 -F "streaming"
    # notes.search --all --recent "streaming"

    function _usage {
        echo "usage : notes.search.new <options> <pattern>"
        echo "options :"
        echo "                  -E : regex search"
        echo "                  -F : fast search"
        echo "                  -r : reverse order (newer first)"
        echo "              --all : search all notes"
        echo "              --txt : search text notes only (default)"
        echo "              --log : search logs only "
        echo " --newer-than <days> : useful with --log; search only in files that were modified in the last <days> days"
        echo "            --recent : useful with --log; search only in files that were modified in the last 30 days"
        echo "          --headings : search only in headings"
    }

    local search_dirs="$NOTES_TXT"
    local search_type
    local find_recent
    local headings headings_prefix
    local pattern
    local grep_opts grep_opts_debug
    local ls_opts="-t"

    # echo "function args $@"
    TEMP=$(getopt -uo EFr -l all,log,txt,LOG,TXT,newer-than:,headings,recent,all -- $@)
    # echo "TEMP $TEMP"
    set -- $TEMP
    while (($# > 0))
    do
        case "$1" in
            --all)
                search_dirs="$NOTES_TXT $NOTES_LOG"
                ;;

            --log|--LOG)
                search_dirs="$NOTES_LOG"
                ;;

            --txt|--TXT)
                search_dirs="$NOTES_TXT"
                ;;

            --newer-than)
                shift
                find_recent="-mtime -$1"
                ;;
           
            --heading?)
                headings_prefix="^\*.* "
                ;;
           
            --recent)
                find_recent="-mtime 30"
                ;;

            -E)
                grep_opts="-E"
                ;;

            -F)
                grep_opts="-F"
                ;;
            -r)
                ls_opts="-rt"
                ;;
            --)
                shift
                pattern="$*"
                ;;
        esac
        shift
    done
   
   
    [[ -z $pattern ]] && (echo "no pattern specified"; _usage; return 1)
    [[ -n $grep_opts ]] && grep_opts_debug="with the $grep_opts option"
    > /tmp/notes.search.results.colors
    find $search_dirs -not -name "*~" -type f $find_recent | xargs ls "$ls_opts" | while read file
    do
        grep --color=always -H -n -i $grep_opts "$headings_prefix$pattern" "$file"  | tee -a /tmp/notes.search.results.colors
    done   

    pretty.colors.remove /tmp/notes.search.results.colors > /tmp/notes.search.results
}


GazL 10-23-2022 08:46 AM

Unless you want to support POSIX short option combining then I find it just as easy to parse options manually rather than use getopt or getopts.

getopt is typically used with an 'eval' on the set -- "$VAR" line, which solves some of the whitespace issues. Use of the 'eval' is one of the reasons I don't like to use getopt, along with the fact it's not standard and there are incompatible versions out there.

In bash scripts I prefer to stick with getopts and POSIX short-options where possible, or just handle options manually: if that is not possible,or where I don't care about option combining.


For your usage function scoping issue... the best you can likely do is use a good naming convention to keep things tidy: notes.search() and notes.search.usage() perhaps?

ychaouche 10-23-2022 08:56 AM

Thanks for the nice feedback GazL!
I like your naming suggestion very much!
Thanks also for explaining about the eval.
I saw that in some bash scripts and didn't understand why it was used,
or why would anyone use eval at all.
I learned in my early days of computing that eval was another name for evil.
I will consider rewriting the options parsing without getopt or getopts.
It's interesting that bash doesn't seem to have a universal solution for that.
Maybe writing it by hand isn't that hard after all.
Appended to my todo-list.

boughtonp 10-23-2022 09:33 AM


 
Put the code into ShellCheck and it highlights several issues to be resolved. (Also, there is a command-line version of ShellCheck which can be used offline.)

Actually, I'm surprising that ShellCheck didn't warn about not to use getopt. As per that link: "Every Unix system which still ships getopt includes warnings in its man page telling you not to use it."

Here is how to correctly parse command line options with Bash.


ychaouche 10-23-2022 09:39 AM

Quote:

Actually, I'm surprising that ShellCheck didn't warn about not to use getopt.
That's because I use GNU getopt, which solves some of the problems of the traditional unix getopt.
I should consider rewriting the options parsing loop by hand.
Thanks for the link!

GazL 10-23-2022 10:22 AM

Quote:

Originally Posted by boughtonp (Post 6388114)

I don't know about "correctly". The case statement in that first example starts checking options before it checks for '--' which is a mistake. '--' or the first non-option argument should stop option processing according to POSIX rules.

I typically do it like this:
Code:

#!/bin/sh

# Example of parsing command-line options/arguments.
#
# This supports all the common option style variants,
# but not POSIX short-option combining.
#  e.g.
#    ./scriptname -llogfile.txt
#    ./scriptname -l logfile.txt
#    ./scriptname --logfile=logfile.txt
#    ./scriptname --logfile logfile.txt

########################################################################
# Parse Option Args:

unset opt_logfile opt_verbose

while [ $# -gt 0 ]
do
  case "$1" in
        --)            shift ; break ;;
        -|""|[!-]*)    break ;;
        --logfile)    opt_logfile=${2:?logfile: missing option} ; shift ;;
        --logfile=?*)  opt_logfile=${1#*=} ;;
        --verbose)    opt_verbose=1 ;;
        --?*)          echo "Unknown long-option:$1" >&2 ; exit 1 ;;
        -l?*)          opt_logfile=${1#-l} ;;
        -l)            opt_logfile=${2:?logfile: missing option} ; shift ;;
        -v)            opt_verbose=1 ;;
        -?*)          echo "Unknown option:$1" >&2 ; exit 1 ;;
  esac
  shift
done

[ -n "$opt_logfile" ]  && echo "logfile is: \"$opt_logfile\""

if [ $# -gt 0 ] && [ ${opt_verbose:-0} -ne 0 ]; then
  echo remaining args are:
  for arg in "$@"
  do
        echo $arg
  done
fi

exit 0

################################################################# End. #

Though it's allowed by the POSIX standard, I don't like splitting option arguments from the option with whitespace as it can lead to ambiguity, for example: "--logfile --verbose"; did the user intend to use a filename '--verbose' or just forgot to supply the argument?
One can add additional checks for this sort of stuff, but, personally I favour the use of -l<filename> or --logfile=<filename> formats to avoid the situation.

boughtonp 10-23-2022 11:21 AM

Quote:

Originally Posted by GazL (Post 6388119)
I don't know about "correctly". The case statement in that first example starts checking options before it checks for '--' which is a mistake. '--' or the first non-option argument should stop option processing according to POSIX rules.

The "--" case is correct when it comes prior to any ambiguous cases (i.e. "-?*" and "*"), which it does. (If you disagree, prove it with a specific example of where it fails.)

I do tend to put "--" as the first case too, but Wooledge's version still works.


GazL 10-23-2022 12:10 PM

The problem will come should you start using wildcards in any of the case matches that preceed '--' For example, you could run into trouble if you had a nested case, like this
Code:

-?)  # process short option:
    case ${1#-} in ...

So, while what is there is technically correct given the limited options specified, as example boilerplate I'd argue that these two should always come first.
Code:

--)            shift ; break ;;
-|""|[!-]*)    break ;;

followed by long-option matches, followed by short option matches.

boughtonp 10-23-2022 04:31 PM


 
Given the example already includes short options, I'm unconvinced that's a likely scenario.

It is an easy change though, so I guess you could hop on IRC and see if GreyCat agrees.

No reason to split short/long options though, that's just unnecessary duplication.


grail 10-25-2022 10:11 AM

So if you are going to go the Wooledge Wiki, there is plenty to read :)

So here are my personal thoughts (so please take what you like and abaandon the rest :) )

1. I prefer to use cat and a heredoc when creating a usage message:
Code:

_usage()
{
  cat<<-USAGE
    usage : notes.search.new <options> <pattern>
    options :
                      -E : regex search
                      -F : fast search
                      -r : reverse order (newer first)
                  --all : search all notes
                  --txt : search text notes only (default)
                  --log : search logs only
    --newer-than <days> : useful with --log; search only in files that were modified in the last <days> days
                --recent : useful with --log; search only in files that were modified in the last 30 days
              --headings : search only in headings
  USAGE
}

2. As you have found the nesting of functions does not obscure the function to be local only, but an alternative would be to create it as an array, which you can declare as local:
Code:

declare -a usage

usage=('usage : notes.search.new <options> <pattern>'
                'options :'
                '                  -E : regex search'
                '                  -F : fast search'
                '                  -r : reverse order (newer first)'
                '              --all : search all notes'
                '              --txt : search text notes only (default)'
                '              --log : search logs only '
                ' --newer-than <days> : useful with --log; search only in files that were modified in the last <days> days'
                '            --recent : useful with --log; search only in files that were modified in the last 30 days'
                '          --headings : search only in headings'
        )

IFS=$'\n'
echo "${usage[*]}"
unset IFS

It is a little more cumbersome and has the nasty IFS setting/unsetting, but it does solve it being only local ;)

3. I agree with the others about just write your own parameter parsing

4. I don't see the need to assign to a variable and then call set to get you back where you started, or use a while loop:
Code:

for param in $(getopt -uo EFr -l all,log,txt,LOG,TXT,newer-than:,headings,recent,all -- $@)
do
  case ...
done

5. Don't clutter a system with tmp files (> /tmp/notes.search.results.colors) look up mktemp command, saves on name clashes

6. Check out Wooledge page on why not to process ls output

7. Also on Wooledge is issues (probably not affecting you here) around "command | while" and subshells and variable issues

8. Lastly, you invite discussion which some may look at requiring a test run with known factors, however there are several variables and commands which only exist on your system. So please remember
to try and include a working set of options :)



Also, welcome on your bash journey :)

GazL 10-25-2022 12:24 PM

Interesting idea. You can avoid the ugly IFS stuff if you do it something like this:
Code:

func1()
{
  local -a usage=(
    'func1:'
    '  usage line one'
    '  usage line two'
  )

  case "$1" in
    -h)  printf '%s\n' "${usage[@]}" ;;
    *)  printf '%s\n' "func1() called with args: " "$@" ;;
  esac
}

Using an array makes it incompatible with the basic POSIX shell of course, but if that is not a concern then it looks like a tidy way to do it without having a heap of usage text cluttering up your case statement logic.

grail 10-26-2022 04:36 AM

Quote:

Originally Posted by GazL (Post 6388539)
Interesting idea. You can avoid the ugly IFS stuff if you do it something like this:
Code:

func1()
{
  local -a usage=(
    'func1:'
    '  usage line one'
    '  usage line two'
  )

  case "$1" in
    -h)  printf '%s\n' "${usage[@]}" ;;
    *)  printf '%s\n' "func1() called with args: " "$@" ;;
  esac
}

Using an array makes it incompatible with the basic POSIX shell of course, but if that is not a concern then it looks like a tidy way to do it without having a heap of usage text cluttering up your case statement logic.

I so often forget that one GazL, much nicer option :)

MadeInGermany 10-27-2022 02:28 PM

Quote:

Code:

[[ -z $pattern ]] && (echo "no pattern specified"; _usage; return 1)

Please avoid the fork/exec of a ( subshell ).
Use a command group
Code:

[[ -z $pattern ]] && {
  echo "no pattern specified"; _usage; return 1
}

Or the "speaking"
Code:

if [[ -z $pattern ]]; then
  echo "no pattern specified"; _usage; return 1
fi

A here document needs a temporary file.
Less overhead: a multi-line echo
Code:

_usage()
{
  echo "\
    usage : notes.search.new <options> <pattern>
    options :
                      -E : regex search
                      -F : fast search
                      -r : reverse order (newer first)
                  --all : search all notes
                  --txt : search text notes only (default)
                  --log : search logs only
    --newer-than <days> : useful with --log; search only in files that were modified in the last <days> days
                --recent : useful with --log; search only in files that were modified in the last 30 days
              --headings : search only in headings"
}

The only problem is that it is in quotes - embedded quotes must be escaped.

bigearsbilly 12-05-2022 04:44 AM

Here's my take on args, using keyword=value
keywords become variables, quoted whitespace works and "$@" is used normally

Code:

#      designed with korn, works with bash and most shells
#      automagic shell keyword arguments, like:
#
#              name='mr smith' address='the matrix' job=agent
#     
for V in "$@"
do
        case "$V" in
                *=*) eval ${V%=*}='${V#*=}';;
                *) args[i++]="$V";;
        esac
done
[ "$i" ] && set "${args[@]}"|| shift $#
unset i args V

# EXAMPLE usage
#
#      # optional and default arguments
#      address='no fixed abode'
#      job=
#
#      source keywords
#     
#      # mandatory arguments
#      set -o nounset
#      : $name
#      : $address
#     
#      echo "hello [$name] of [$address]"
#      [ "$job" ] && echo how is the $job business\?|| echo get a job\!



All times are GMT -5. The time now is 04:03 AM.