LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   optimizing bash process (http://www.linuxquestions.org/questions/programming-9/optimizing-bash-process-4175431381/)

kendosan 10-09-2012 02:38 PM

optimizing bash process
 
I have over the past weeks been putting together certain ideas, I have currently a working idea that allows me process video files automatically to html5 WebM VP8 format, my primary language are shell scripts in bash. I want to know if there is better way in simplifying the coding to extremely small sections easy to understand and such.

I'm using nohup to loop every 60 seconds into another shell script, this nohup shell script fires grep and checks if ffmpeg is running or not, this way grep acts for me as a yes or no, from here i can fire ffmpeg to process the videos automatically without me doing anything. essentially grep when is active and checking for ffmpeg looking for 1 or 0, acts as a encoding queue because ffmpeg processes one file at a time.

so i fire nohup into
Quote:

nohup sh nohup.sh >/dev/null 2>&1 &

Quote:

#!/bin/sh
while true
do
#now we call second
/var/www/bash/grep_ffmpeg.sh
sleep 60
done

and grep checks

Quote:

SERVICE='ffmpeg'

if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
echo "$SERVICE service running, will not bother ffmpeg yet because of current process"
else
echo "$SERVICE is not running"
echo "$SERVICE is not running!" | /var/www/bash/start_service.sh
fi
if grep finds that ffmpeg service is not running, it starts another shell script that automatically cleans the filenames, and starts the encoding process all over again.

I have around 8 sh scripts, whats the best way of merging them into maybe 2 or a single script that does everything

this is the main shell script, used to do the processing, here I'm in need of figuring a simpler way of doing everything better

Quote:

#!/bin/bash


chmod 777 -R /var/www/downloads/*

sleep 5

find /var/www/downloads -regextype posix-egrep -regex '.*\.(vcd|idx|ass|nfo|jpg|JPG|JPEG|gif|png|NFO|log|srt|php|css|js|rar|vob|ts|TS|mp3|ppt|gif|doc|wm|w ma|MPG|zip|wav|tar|gz|pdf|html|py|pl|exe|htm|exe|pdf|JPG|asf|jpeg|txt|text)$' -exec mv "{}" /var/www/bin/ \;

sleep 5

find /var/www/downloads -regextype posix-egrep -regex '.*\.(avi|m2ts|M2TS|3gp|AVI|MPG|mkv|MKV|ogv|wmv|mp4|mpg|divx|MPG|mpeg|ogg|ogm)$' -exec mv "{}" /var/www/cache01/ \;

sleep 2

cd /var/www/cache01
for f in *; do mv -- "$f" "${f//[][(){\}]}"; done

sleep 5

cd /var/www/cache01
for infile in *.*;
do
#replace " - " with a single underscore.
NEWFILE1=`echo $infile | sed 's/\s-\s/_/g'`;
#replace spaces with underscores
NEWFILE2=`echo $NEWFILE1 | sed 's/\s/_/g'`;
#replace "-" dashes with underscores.
NEWFILE3=`echo $NEWFILE2 | sed 's/-/_/g'`;
#remove exclamation points
NEWFILE4=`echo $NEWFILE3 | sed 's/!//g'`;
#remove commas
NEWFILE5=`echo $NEWFILE4 | sed 's/,//g'`;
mv "$infile" "/var/www/cache02/$NEWFILE5";
done;

sleep 5

/var/www/bash/get_ffmpeg.sh

sleep 5

/var/www/bash/get_thumbs.sh

sleep 5

find /var/www/cache02 -regextype posix-egrep -regex '.*\.(jpg|webm)$' -exec mv "{}" /var/www/public/ \;

sleep 2

find /var/www/cache02 -regextype posix-egrep -regex '.*\.(txt)$' -exec mv "{}" /var/www/public/ \;

sleep 5

rm -f `find /var/www/logs/* | grep -v .text`

sleep 5

rm -rf /var/www/bin/*

sleep 5

/var/www/bash/purge_old_files.sh

sleep 5

find /var/www/downloads/* -empty -type d -delete

i use sleep alot, because of 128mb memory limit, if there is something better please let me know.

I'm also not a coder, but i learn and pickup fast when explained properly

Habitual 10-09-2012 06:44 PM

Quote:

Originally Posted by kendosan (Post 4801532)
...I'm in need of figuring a simpler way of doing everything better
...

Couldn't you make functions out of some of the secondary scripts and just call the function?

That's what I'd do, but I too am no coder. But I manage shell scripts now and then...but even those are getting tedious...

Obligatory Bash Tutorials. :)
Bash scripting guides:
http://www.tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html
http://www.tldp.org/LDP/Bash-Beginne...tml/index.html
http://www.gnu.org/software/bash/man...ode/index.html
http://www.grymoire.com/Unix/Sh.html
http://tldp.org/LDP/abs/abs-guide.pdf
http://www.tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/BashFAQ
http://mywiki.wooledge.org/BashPitfalls
http://rute.2038bug.com/index.html.gz
http://bashscripts.org/forum/

Have fun!

unSpawn 10-09-2012 07:32 PM

Quote:

Originally Posted by kendosan (Post 4801532)
so i fire nohup into (..) and grep checks (..) if grep finds that ffmpeg service is not running, it starts another shell script that automatically cleans the filenames, and starts the encoding process all over again.

I'd drive the main script from a crontab entry. It does away with the "nohup", the nohup.sh and grep_ffmpeg.sh script if at the top of your main script you just
Code:

pgrep ffmpeg >/dev/null 2>&1 && exit 0
and a crontab entry has the additional benefit that you'll be notified of any errors by email.


Quote:

Originally Posted by kendosan (Post 4801532)
i use sleep alot, because of 128mb memory limit, if there is something better please let me know.

If you need to run other processes then you could make ffmpeg run with a nice level (or you could renice the whole main script with
Code:

renice -n +20 $$
near the top) but the fact remains AFAIK ffmpeg is a RAM and CPU hog and 128 MB is not that much. I doubt 'sleep' helps much wrt that.

I agree using functions does help with accessing repetitive tasks, you should avoid "for" loops and use "while" ones instead and you could do things like
Code:

NEWFILE=${infile// /_}; NEWFILE=${NEWFILE//-/_}; NEWFILE=${NEWFILE//\!/_}; NEWFILE=${NEWFILE//,/_}
and
Code:

find /var/www/logs/ -type f -not -iname \*.text -delete
but there's nothing much to optimize really.

Reuti 10-10-2012 11:41 AM

I’m still not sure about your intended setup. You want to drop files in a folder and they should be handled automatically once the copy/download processes finished? Or you want to start the next bunch of conversions once the old run is over?

Maybe it can also be handled by a queuing system like GNUbatch: if the copy/download completed, you submit a job for this particular file. This way you can also limit the number of executions of the jobs to avoid overloading of the machine.

konsolebox 10-10-2012 01:04 PM

Quote:

Originally Posted by kendosan (Post 4801532)
Code:

#replace " - " with a single underscore.
NEWFILE1=`echo $infile | sed 's/\s-\s/_/g'`;
#replace spaces with underscores
NEWFILE2=`echo $NEWFILE1 | sed 's/\s/_/g'`;
#replace "-" dashes with underscores.
NEWFILE3=`echo $NEWFILE2 | sed 's/-/_/g'`;
#remove exclamation points
NEWFILE4=`echo $NEWFILE3 | sed 's/!//g'`;
#remove commas
NEWFILE5=`echo $NEWFILE4 | sed 's/,//g'`;


For that part at least you could use simplify it to these:
Code:

#replace " - " with a single underscore.
NEWFILE=${infile// - /_}
#replace spaces and dashes with underscores
NEWFILE=${NEWFILE//[:blank:]-/_}
#remove exclamation points and commas
NEWFILE=${NEWFILE//[\!,]}

or just one line
Code:

NEWFILE=${infile// - /_}; NEWFILE=${NEWFILE//[[:blank:]-]/_}; NEWFILE=${NEWFILE//[\!,]}
If extglob is enabled (shopt extglob), you could do it with two only steps:
Code:

NEWFILE=${infile//@( - |[[:blank:]-])/_}; NEWFILE=${NEWFILE//[\!,]};
Note that lesser code doesn't always mean faster or more efficient code. Though at least it's easier to read.

konsolebox 10-10-2012 01:38 PM

Quote:

ps ax | grep -v grep | grep $SERVICE
And I think you could just use killall for that:
Code:

killall -s 0 "$SERVICE" &>/dev/null
But please check. I forgot if it has issues with scripts like pidof though I think it's unlikely.

Also please use CODE instead of QUOTE when quoting your codes.

kendosan 10-10-2012 01:58 PM

my setup looks something like this

http://i48.tinypic.com/11afxmq.png

I'm currently trying out whats been said here, slowly I'm putting everything into a single short shell script. thanks everybody to telling me some new tools i never heard of such as GNUbatch :)

kendosan 10-10-2012 02:02 PM

Quote:

Originally Posted by Reuti (Post 4802323)
Iím still not sure about your intended setup. You want to drop files in a folder and they should be handled automatically once the copy/download processes finished? Or you want to start the next bunch of conversions once the old run is over?

Maybe it can also be handled by a queuing system like GNUbatch: if the copy/download completed, you submit a job for this particular file. This way you can also limit the number of executions of the jobs to avoid overloading of the machine.

sorry to confuse you, my setup as of right now is like dropbox, i have a folder where i drop video files such as mkv mpg wmv etc, i used to loop a shell script every 60 seconds and check if ffmpeg is running, if it was not in the service, grep would start_service.sh to process the files. i was just asking if there was cleaner way of shorting my shell scripts and making things tidy

konsolebox 10-10-2012 02:03 PM

I'd rather do things this way also:
Code:

#!/bin/bash

SERVICE_PID=0

shopt -s extglob

function service {
    # Do service stuffs here.
}

function service_start {
    if service_check; then
        echo "Service is still running."
        return 1
    else
        echo "Starting service."
        service &
        SERVICE_PID=$!
        sleep 1
        if service_check; then
            echo "Service started."
        else
            echo "Service failed to start."
            return 1
        fi
    fi
}

function service_check {
    [[ $SERVICE_PID == +([[:digit:]]) && SERVICE_PID -gt 0 ]] && kill -s 0 "$SERVICE_PID"
}

function service_stop {
   echo "Stopping service."
    if service_check; then
        kill "$SERVICE_PID"
     service_check && {
            echo "Failed to stop service."
            return 1
        }
    else
        echo "Service is no longer running."
        return 1
    fi
}

function service_restart {
    if service_check; then
        service_stop || return 1
    fi
    service_start 
}

function main {
    service_start

    for (( ;; )); do
        read -p "Your command: " CMD
        case "$CMD" in
        c|C|check)
            service_check && echo "Service is running." || "Service is not running."
            ;;
        r|R|restart)
            service_restart
            ;;
        q|Q|quit)
            if service_check; then
                service_stop && break
            else
                break # or exit
            fi
            ;;
        esac
        # Service checking could be automated along with read -t TIMEOUT but it depends on custom.
    done
}

main "$@"


kendosan 10-10-2012 02:31 PM

thank you very much for that example, I'm reading stuff here glob and trying to figure it all out :D

i understand about maybe 80% of what you scripted lol but totally confused about ([[:digit:]]) what do the ([[: and :]]) do :O

i use this to check for 1 or 0

Quote:

#!/bin/sh
ffmpeg=0 # check if pid is running
CHECKINGPERIOD=60 # check in seconds

while [ 1=1 ];
do

if [ ! "$(pidof ffmpeg)" ]
then
if [ "$ffmpeg" = "0" ]; then
echo "WARNING! ffmpeg crashed!"
ffmpeg=1
fi
/var/www/bash/start_service.sh
else
if [ "$ffmpeg" = "1" ]; then
echo "ffmpeg was successfully restarted."
ffmpeg=0
fi
fi
sleep $CHECKINGPERIOD /dev/null

done
this one is with grep
Quote:

#!/bin/sh
SERVICE='ffmpeg'

if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
echo "$SERVICE service running, will not bother ffmpeg yet"
else
echo "$SERVICE is not running!" | /var/www/bash/start_service.sh
fi

Reuti 10-10-2012 04:34 PM

Quote:

Originally Posted by kendosan (Post 4802415)
sorry to confuse you, my setup as of right now is like dropbox, i have a folder where i drop video files such as mkv mpg wmv etc, i used to loop a shell script every 60 seconds and check if ffmpeg is running, if it was not in the service, grep would start_service.sh to process the files. i was just asking if there was cleaner way of shorting my shell scripts and making things tidy

Aha, thanks for clarification. My first thought would be to use inotify events, but it would need a custom C program.

There is inotifyd to start a script or alike in case of an event in a directory:
Code:

$ incrontab -e
/home/reuti IN_CLOSE_WRITE /home/reuti/converter.sh $@/$#

I was wondering with your initial setup, that it might start a conversion although the file isntít written completely. The above can also be used to submit a batch job instead of a direct processing to any queuing system to avoid overloading.

kendosan 10-10-2012 05:36 PM

yes i have looked into this inotify, i thought it was part of ubuntu 12.04 server by default, maybe im mistaking the utilities from ubuntu, but anyway that is good to know more tools :D, i'm looking also into ruby, as i have heard its pretty cool but thats for another thread

i will need to read much more to understand, the batch job with inotify or incrontab, right now grep_ffmpeg.sh acts as a service check and batch job creator. i mean ffmpeg wont get to execute because grep makes sure if it runs or not. and knows what to follow .

I'm pretty scared to mess with the code after all this i have spent nearly 4 weeks putting everything very slowly :D i will have to backup everything first.

here i go wish me luck i don't destroy everything :D

a demo of what I'm working on can be seen here,

konsolebox 10-10-2012 07:46 PM

Quote:

Originally Posted by kendosan (Post 4802433)
i understand about maybe 80% of what you scripted lol but totally confused about ([[:digit:]]) what do the ([[: and :]]) do :O

It's in the bash manual. [:digit:] is just the same as 0-9, so [[:digit:]] == [0-9]. The first part of the statement checks if the PID is valid then the second part (with kill -s 0) does the actual check if the process still exists and running.
Quote:

this one is with grep
Like I said instead of listing processes and grepping with grep for an existing process name, you could instead just use 'killall -s 0' to check if the process exists. killall returns 0 if it does exist and 1 if doesn't. If you could save the PID with $! it could it even be more efficient so that you could just use 'kill', a builtin of bash, to check it instead of a multi-process and external command checker like killall. If you still prefer listing processes anyway, it's better to just use 'pgrep' instead.

kendosan 10-11-2012 10:20 AM

thanks kon, I'm still reading what Reuti posted lol,
Quote:

killall -s 0
sounds much shorter way of doing the checking ! im going to try it, I've already broken something in the process such as thumbnails getting created 2 times :D

David the H. 10-13-2012 07:21 AM

To expand a bit more on konsolebox' last post, [::] represents a regular expression character class, preset ranges of characters that can be used inside [] bracket expressions. bash has also adapted them for use in globbing.

Note that the exact entries represented by the character classes can vary depending on the current system locale. The regular expressions section of info grep has a very readable description of all of them.


Edit: I think most of your find commands could be replaced with simple globbing patterns too, particularly if you use extended globbing. It would help to know if the search has to be recursive, however.

Code:

# find /var/www/cache02 -regextype posix-egrep -regex '.*\.(jpg|webm)$' -exec mv "{}" /var/www/public/ \;

#if non-recursive:
mv -t /var/www/public/ /var/www/cache02/*.jpg /var/www/cache02/*.webm

#or with extglobs:
shopt -s extglob
mv -t /var/www/public/ /var/www/cache02 /var/www/cache02/*@(.jpg|.webm)

#if recursive; requires bash v4+'s globstar option
shopt -s extglob globstar

mv -t /var/www/public/ /var/www/cache02/**/*@(.jpg|.webm)

If the tree to search is large, you may want to run some tests to determine whether globstar or find is faster.


All times are GMT -5. The time now is 09:30 AM.