optimizing bash process

kendosan · 10-09-2012, 02:38 PM

I have over the past weeks been putting together certain ideas, I have currently a working idea that allows me process video files automatically to html5 WebM VP8 format, my primary language are shell scripts in bash. I want to know if there is better way in simplifying the coding to extremely small sections easy to understand and such.

I'm using nohup to loop every 60 seconds into another shell script, this nohup shell script fires grep and checks if ffmpeg is running or not, this way grep acts for me as a yes or no, from here i can fire ffmpeg to process the videos automatically without me doing anything. essentially grep when is active and checking for ffmpeg looking for 1 or 0, acts as a encoding queue because ffmpeg processes one file at a time.

so i fire nohup into

Quote:

nohup sh nohup.sh >/dev/null 2>&1 &

Quote:

#!/bin/sh
while true
do
#now we call second
/var/www/bash/grep_ffmpeg.sh
sleep 60
done

and grep checks

Quote:

SERVICE='ffmpeg'

if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
echo "$SERVICE service running, will not bother ffmpeg yet because of current process"
else
echo "$SERVICE is not running"
echo "$SERVICE is not running!" | /var/www/bash/start_service.sh
fi

if grep finds that ffmpeg service is not running, it starts another shell script that automatically cleans the filenames, and starts the encoding process all over again.

I have around 8 sh scripts, whats the best way of merging them into maybe 2 or a single script that does everything

this is the main shell script, used to do the processing, here I'm in need of figuring a simpler way of doing everything better

Quote:

#!/bin/bash

chmod 777 -R /var/www/downloads/*

sleep 5

find /var/www/downloads -regextype posix-egrep -regex '.*\.(vcd|idx|ass|nfo|jpg|JPG|JPEG|gif|png|NFO|log|srt|php|css|js|rar|vob|ts|TS|mp3|ppt|gif|doc|wm|w ma|MPG|zip|wav|tar|gz|pdf|html|py|pl|exe|htm|exe|pdf|JPG|asf|jpeg|txt|text)$' -exec mv "{}" /var/www/bin/ \;

sleep 5

find /var/www/downloads -regextype posix-egrep -regex '.*\.(avi|m2ts|M2TS|3gp|AVI|MPG|mkv|MKV|ogv|wmv|mp4|mpg|divx|MPG|mpeg|ogg|ogm)$' -exec mv "{}" /var/www/cache01/ \;

sleep 2

cd /var/www/cache01
for f in *; do mv -- "$f" "${f//[][(){\}]}"; done

sleep 5

cd /var/www/cache01
for infile in *.*;
do
#replace " - " with a single underscore.
NEWFILE1=`echo $infile | sed 's/\s-\s/_/g'`;
#replace spaces with underscores
NEWFILE2=`echo $NEWFILE1 | sed 's/\s/_/g'`;
#replace "-" dashes with underscores.
NEWFILE3=`echo $NEWFILE2 | sed 's/-/_/g'`;
#remove exclamation points
NEWFILE4=`echo $NEWFILE3 | sed 's/!//g'`;
#remove commas
NEWFILE5=`echo $NEWFILE4 | sed 's/,//g'`;
mv "$infile" "/var/www/cache02/$NEWFILE5";
done;

sleep 5

/var/www/bash/get_ffmpeg.sh

sleep 5

/var/www/bash/get_thumbs.sh

sleep 5

find /var/www/cache02 -regextype posix-egrep -regex '.*\.(jpg|webm)$' -exec mv "{}" /var/www/public/ \;

sleep 2

find /var/www/cache02 -regextype posix-egrep -regex '.*\.(txt)$' -exec mv "{}" /var/www/public/ \;

sleep 5

rm -f `find /var/www/logs/* | grep -v .text`

sleep 5

rm -rf /var/www/bin/*

sleep 5

/var/www/bash/purge_old_files.sh

sleep 5

find /var/www/downloads/* -empty -type d -delete

i use sleep alot, because of 128mb memory limit, if there is something better please let me know.

I'm also not a coder, but i learn and pickup fast when explained properly

Habitual · 10-09-2012, 06:44 PM

Quote:

Originally Posted by kendosan

...I'm in need of figuring a simpler way of doing everything better
...

Couldn't you make functions out of some of the secondary scripts and just call the function?

That's what I'd do, but I too am no coder. But I manage shell scripts now and then...but even those are getting tedious...

Obligatory Bash Tutorials.

Bash scripting guides:
http://www.tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
http://www.tldp.org/LDP/Bash-Beginne...tml/index.html
http://www.gnu.org/software/bash/man...ode/index.html
http://www.grymoire.com/Unix/Sh.html
http://tldp.org/LDP/abs/abs-guide.pdf
http://www.tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/BashFAQ
http://mywiki.wooledge.org/BashPitfalls
http://rute.2038bug.com/index.html.gz
http://bashscripts.org/forum/

Have fun!

unSpawn · 10-09-2012, 07:32 PM

Quote:

Originally Posted by kendosan

so i fire nohup into (..) and grep checks (..) if grep finds that ffmpeg service is not running, it starts another shell script that automatically cleans the filenames, and starts the encoding process all over again.

I'd drive the main script from a crontab entry. It does away with the "nohup", the nohup.sh and grep_ffmpeg.sh script if at the top of your main script you just

Code:

pgrep ffmpeg >/dev/null 2>&1 && exit 0

and a crontab entry has the additional benefit that you'll be notified of any errors by email.

Quote:

Originally Posted by kendosan

i use sleep alot, because of 128mb memory limit, if there is something better please let me know.

If you need to run other processes then you could make ffmpeg run with a nice level (or you could renice the whole main script with

Code:

renice -n +20 $$

near the top) but the fact remains AFAIK ffmpeg is a RAM and CPU hog and 128 MB is not that much. I doubt 'sleep' helps much wrt that.

I agree using functions does help with accessing repetitive tasks, you should avoid "for" loops and use "while" ones instead and you could do things like

Code:

NEWFILE=${infile// /_}; NEWFILE=${NEWFILE//-/_}; NEWFILE=${NEWFILE//\!/_}; NEWFILE=${NEWFILE//,/_}

and

Code:

find /var/www/logs/ -type f -not -iname \*.text -delete

but there's nothing much to optimize really.

Reuti · 10-10-2012, 11:41 AM

I’m still not sure about your intended setup. You want to drop files in a folder and they should be handled automatically once the copy/download processes finished? Or you want to start the next bunch of conversions once the old run is over?

Maybe it can also be handled by a queuing system like GNUbatch: if the copy/download completed, you submit a job for this particular file. This way you can also limit the number of executions of the jobs to avoid overloading of the machine.

konsolebox · 10-10-2012, 01:04 PM

Quote:

Originally Posted by kendosan

Code:

#replace " - " with a single underscore.
NEWFILE1=`echo $infile | sed 's/\s-\s/_/g'`;
#replace spaces with underscores
NEWFILE2=`echo $NEWFILE1 | sed 's/\s/_/g'`;
#replace "-" dashes with underscores.
NEWFILE3=`echo $NEWFILE2 | sed 's/-/_/g'`;
#remove exclamation points
NEWFILE4=`echo $NEWFILE3 | sed 's/!//g'`;
#remove commas
NEWFILE5=`echo $NEWFILE4 | sed 's/,//g'`;

For that part at least you could use simplify it to these:

Code:

#replace " - " with a single underscore.
NEWFILE=${infile// - /_}
#replace spaces and dashes with underscores
NEWFILE=${NEWFILE//[:blank:]-/_}
#remove exclamation points and commas
NEWFILE=${NEWFILE//[\!,]}

or just one line

Code:

NEWFILE=${infile// - /_}; NEWFILE=${NEWFILE//[[:blank:]-]/_}; NEWFILE=${NEWFILE//[\!,]}

If extglob is enabled (shopt extglob), you could do it with two only steps:

Code:

NEWFILE=${infile//@( - |[[:blank:]-])/_}; NEWFILE=${NEWFILE//[\!,]};

Note that lesser code doesn't always mean faster or more efficient code. Though at least it's easier to read.

konsolebox · 10-10-2012, 01:38 PM

Quote:

ps ax | grep -v grep | grep $SERVICE

And I think you could just use killall for that:

Code:

killall -s 0 "$SERVICE" &>/dev/null

But please check. I forgot if it has issues with scripts like pidof though I think it's unlikely.

Also please use CODE instead of QUOTE when quoting your codes.

kendosan · 10-10-2012, 01:58 PM

my setup looks something like this

http://i48.tinypic.com/11afxmq.png

I'm currently trying out whats been said here, slowly I'm putting everything into a single short shell script. thanks everybody to telling me some new tools i never heard of such as GNUbatch

kendosan · 10-10-2012, 02:02 PM

Quote:

Originally Posted by Reuti

I’m still not sure about your intended setup. You want to drop files in a folder and they should be handled automatically once the copy/download processes finished? Or you want to start the next bunch of conversions once the old run is over?

Maybe it can also be handled by a queuing system like GNUbatch: if the copy/download completed, you submit a job for this particular file. This way you can also limit the number of executions of the jobs to avoid overloading of the machine.

sorry to confuse you, my setup as of right now is like dropbox, i have a folder where i drop video files such as mkv mpg wmv etc, i used to loop a shell script every 60 seconds and check if ffmpeg is running, if it was not in the service, grep would start_service.sh to process the files. i was just asking if there was cleaner way of shorting my shell scripts and making things tidy

konsolebox · 10-10-2012, 02:03 PM

I'd rather do things this way also:

Code:

#!/bin/bash

SERVICE_PID=0

shopt -s extglob

function service {
    # Do service stuffs here.
}

function service_start {
    if service_check; then
        echo "Service is still running."
        return 1
    else
        echo "Starting service."
        service &
        SERVICE_PID=$!
        sleep 1
        if service_check; then
            echo "Service started."
        else
            echo "Service failed to start."
            return 1
        fi
    fi
}

function service_check {
    [[ $SERVICE_PID == +([[:digit:]]) && SERVICE_PID -gt 0 ]] && kill -s 0 "$SERVICE_PID"
}

function service_stop {
 　　echo "Stopping service."
    if service_check; then
        kill "$SERVICE_PID"
　　　　　service_check && {
            echo "Failed to stop service."
            return 1
        }
    else
        echo "Service is no longer running."
        return 1
    fi
}

function service_restart {
    if service_check; then
        service_stop || return 1
    fi
    service_start  
}

function main {
    service_start

    for (( ;; )); do
        read -p "Your command: " CMD
        case "$CMD" in
        c|C|check)
            service_check && echo "Service is running." || "Service is not running."
            ;;
        r|R|restart)
            service_restart
            ;;
        q|Q|quit)
            if service_check; then
                service_stop && break
            else
                break # or exit
            fi
            ;;
        esac
        # Service checking could be automated along with read -t TIMEOUT but it depends on custom.
    done
}

main "$@"

kendosan · 10-10-2012, 02:31 PM

thank you very much for that example, I'm reading stuff here glob and trying to figure it all out

i understand about maybe 80% of what you scripted lol but totally confused about ([[:digit:]]) what do the ([[: and :]]) do :O

i use this to check for 1 or 0

Quote:

#!/bin/sh
ffmpeg=0 # check if pid is running
CHECKINGPERIOD=60 # check in seconds

while [ 1=1 ];
do

if [ ! "$(pidof ffmpeg)" ]
then
if [ "$ffmpeg" = "0" ]; then
echo "WARNING! ffmpeg crashed!"
ffmpeg=1
fi
/var/www/bash/start_service.sh
else
if [ "$ffmpeg" = "1" ]; then
echo "ffmpeg was successfully restarted."
ffmpeg=0
fi
fi
sleep $CHECKINGPERIOD /dev/null

done

this one is with grep

Quote:

#!/bin/sh
SERVICE='ffmpeg'

if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
echo "$SERVICE service running, will not bother ffmpeg yet"
else
echo "$SERVICE is not running!" | /var/www/bash/start_service.sh
fi

Reuti · 10-10-2012, 04:34 PM

Quote:

Originally Posted by kendosan

sorry to confuse you, my setup as of right now is like dropbox, i have a folder where i drop video files such as mkv mpg wmv etc, i used to loop a shell script every 60 seconds and check if ffmpeg is running, if it was not in the service, grep would start_service.sh to process the files. i was just asking if there was cleaner way of shorting my shell scripts and making things tidy

Aha, thanks for clarification. My first thought would be to use inotify events, but it would need a custom C program.

There is inotifyd to start a script or alike in case of an event in a directory:

Code:

$ incrontab -e
/home/reuti IN_CLOSE_WRITE /home/reuti/converter.sh $@/$#

I was wondering with your initial setup, that it might start a conversion although the file isnt’t written completely. The above can also be used to submit a batch job instead of a direct processing to any queuing system to avoid overloading.

kendosan · 10-10-2012, 05:36 PM

yes i have looked into this inotify, i thought it was part of ubuntu 12.04 server by default, maybe im mistaking the utilities from ubuntu, but anyway that is good to know more tools

, i'm looking also into ruby, as i have heard its pretty cool but thats for another thread

i will need to read much more to understand, the batch job with inotify or incrontab, right now grep_ffmpeg.sh acts as a service check and batch job creator. i mean ffmpeg wont get to execute because grep makes sure if it runs or not. and knows what to follow .

I'm pretty scared to mess with the code after all this i have spent nearly 4 weeks putting everything very slowly

i will have to backup everything first.

here i go wish me luck i don't destroy everything

a demo of what I'm working on can be seen here,

konsolebox · 10-10-2012, 07:46 PM

Quote:

Originally Posted by kendosan

i understand about maybe 80% of what you scripted lol but totally confused about ([[:digit:]]) what do the ([[: and :]]) do :O

It's in the bash manual. [:digit:] is just the same as 0-9, so [[:digit:]] == [0-9]. The first part of the statement checks if the PID is valid then the second part (with kill -s 0) does the actual check if the process still exists and running.

Quote:

this one is with grep

Like I said instead of listing processes and grepping with grep for an existing process name, you could instead just use 'killall -s 0' to check if the process exists. killall returns 0 if it does exist and 1 if doesn't. If you could save the PID with $! it could it even be more efficient so that you could just use 'kill', a builtin of bash, to check it instead of a multi-process and external command checker like killall. If you still prefer listing processes anyway, it's better to just use 'pgrep' instead.

kendosan · 10-11-2012, 10:20 AM

thanks kon, I'm still reading what Reuti posted lol,

Quote:

killall -s 0

sounds much shorter way of doing the checking ! im going to try it, I've already broken something in the process such as thumbnails getting created 2 times

David the H. · 10-13-2012, 07:21 AM

To expand a bit more on konsolebox' last post, [::] represents a regular expression character class, preset ranges of characters that can be used inside [] bracket expressions. bash has also adapted them for use in globbing.

Note that the exact entries represented by the character classes can vary depending on the current system locale. The regular expressions section of info grep has a very readable description of all of them.

Edit: I think most of your find commands could be replaced with simple globbing patterns too, particularly if you use extended globbing. It would help to know if the search has to be recursive, however.

Code:

# find /var/www/cache02 -regextype posix-egrep -regex '.*\.(jpg|webm)$' -exec mv "{}" /var/www/public/ \;

#if non-recursive:
mv -t /var/www/public/ /var/www/cache02/*.jpg /var/www/cache02/*.webm

#or with extglobs:
shopt -s extglob
mv -t /var/www/public/ /var/www/cache02 /var/www/cache02/*@(.jpg|.webm)

#if recursive; requires bash v4+'s globstar option
shopt -s extglob globstar

mv -t /var/www/public/ /var/www/cache02/**/*@(.jpg|.webm)

If the tree to search is large, you may want to run some tests to determine whether globstar or find is faster.