LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   My script uses lots of cpu (https://www.linuxquestions.org/questions/programming-9/my-script-uses-lots-of-cpu-4175454662/)

genderbender 03-19-2013 04:50 AM

My script uses lots of cpu
 
Hi guys, I wrote a script which reads through a log and writes the log to an sql file - this occasionally runs very slowly and uses up lots of cpu, and other times doesn't populate the sql, can someone offer some assistance? It runs once every 15 minutes via cron.

Code:

#!/bin/bash

LOG=/var/log/tacacs.log
PIDFILE=/var/run/tac2sql.pid
TIME=`cat /var/log/tacacs.log | grep task_id | grep cmd | awk '{print $3}' | uniq`

if [ -e /var/run/tac2sql.pid ]; then
        echo "pid file exists, exiting" >> /var/log/tacacs.log
else
        touch /var/run/tac2sql.pid
for TIME in `echo $TIME`;
do
        month=`grep $TIME $LOG | grep cmd | awk '{print $1}' | tail -n 1`
        day=`grep $TIME $LOG | grep cmd | awk '{print $2}' | tail -n 1`
        destination_ip=`grep $TIME $LOG | grep cmd | awk '{print $4}' | tail -n 1`
        user=`grep $TIME $LOG | grep cmd |  awk '{print $5}' | tail -n 1`
        source_ip=`grep $TIME $LOG | grep cmd | awk '{print $7}' | tail -n 1`
        command=`grep $TIME $LOG | awk 'gsub(/.*cmd=| <cr>.*/,"")' | tail -n 1`
        firewall_test=`echo $command | grep -o "service=shell" | wc -l`
        if [ $firewall_test -eq 1 ]; then
                command=`grep $TIME $LOG | awk 'gsub(/.*cmd=| service.*/,"")' | grep 'service=shell' | sed 's/service=shell.*//'`
        fi
        task_id=`grep $TIME $LOG | grep cmd | awk '{print $9}' | grep -o "[0-9]" | tail -n 1`
        task_number_time=`echo $user$task_id$TIME | tr -d ":" | tail -n 1`
        query1="SELECT COUNT(1) FROM tacacs.tacacs_log WHERE task_number_time = '$task_number_time';"
        RCOUNT=`mysql -u root -p!PASSWORD-s -e "$query1"`
        if [ $RCOUNT -eq 0 ]; then
                query="INSERT INTO tacacs.tacacs_log(month, day, time, username, source_ip, destination_ip, command, task_number_time) VALUES('$month', '$day', '$TIME', '$user', '$source_ip', '$destination_ip', '$command', '$task_number_time');"
                mysql -u root -p!PASSWORD -s -e "$query" &> /dev/null
        fi
done
rm -f /var/run/tac2sql.pid
fi


pan64 03-19-2013 05:49 AM

just one tip:
it forks a lot of tasks like cat, awk, grep, tail, tr....
instead of:
cat /var/log/tacacs.log | grep task_id | grep cmd | awk '{print $3}' | uniq
you can write:
awk ' /task_id.*cmd/ { print $3; exit } ' /var/log/tacacs.log
(or /cmd.*task_id/ whichever comes first)

you saved 4 new processes.
So you need to optimize all those chains and substitute with only one awk script.

pan64 03-19-2013 07:05 AM

also you made a double loop on that log file, so inside the for you will grep the log file again several times, you can simplify it by a single awk (or perl, or ...) script:
# pseudo code
awk '
# next if cmd not found
{
# this will automatically store the last values for every time value.
time = $3
month[time] = $1
day[time] = $2
....
}
END {
# print sql script
}
' # end of awk
# execute one single sql


probably we can give better advice if you show us the structure the logfile

grail 03-19-2013 10:50 AM

On top of pan64's advice I would really re-think using the same variable for your loop as has already been set else where, unless of course your intention is to change the value prior to each loop,
which in itself sounds fraught with danger??
Code:

for TIME in `echo $TIME`

gnashley 03-19-2013 01:30 PM

I counted 39 calls to en external program for each loop. Something like this eliminates a bunch of them:
Code:

set -- `grep $TIME $LOG | grep cmd | tail -n 1`
month=$1
day=$2
destination_ip=$4
user=$5
source_ip=$7


David the H. 03-20-2013 06:59 AM

Here are a few more things you should work to avoid or correct:


1) Don't Read Lines With For

2) Useless Use Of Cat (and grep, etc)

3) $(..) is highly recommended over `..`

4) Scripting With Style
(e.g. Effective use of whitespace and using lowercase names for your variables.)

5) QUOTE ALL OF YOUR VARIABLE EXPANSIONS. You should never leave the quotes off a parameter expansion unless you explicitly want the resulting string to be word-split by the shell and possible globbing patterns expanded. This is a vitally important concept in scripting, so train yourself to do it correctly now. You can learn about the exceptions later.

http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

6) When using bash or ksh, it's recommended to use [[..]] for string/file tests, and ((..)) for numerical tests. Avoid using the old [..] test unless you specifically need POSIX-style portability.

http://mywiki.wooledge.org/BashFAQ/031
http://mywiki.wooledge.org/ArithmeticExpression

7) Don't use single, scalar variables when you have lists of things. Always use arrays when you have multiple related values to process.


8) And look here for various ways to replace external commands with shell built-ins:

string manipulations in bash

( In short, use external commands like grep and awk when you need to operate on whole files or large text blocks at once, such as when extracting text strings for later use. But once you have those strings stored in variables, it's almost always more efficient to use built-ins to process them. )

genderbender 03-20-2013 07:14 AM

Couple of lines from my log, one from a firewall and one from a script (notice the difference):

Mar 20 09:14:47 1.2.3.4 user tty1 1.2.3.4 stop task_id=514638 timezone=gmt service=shell start_time=1363770887 priv-lvl=1 cmd=show env all <cr>
Mar 20 06:00:25 4.3.2.1 user2 22 4.3.2.1 stop task_id=81 cmd=copy /noconfirm running-config tftp://127.0.0.1/4501_ConfigFile.txt service=shell elapsed_time=0

I realised why cpu was overly high though, the logs weren't rotating properly so my script was reading 80mb worth of logs rather than 2mb. I could still do with some help optimizing though. I'll read through some of the comments, although my adaptions have not worked :/... Wrong fields or no fields output for example.

pan64 03-20-2013 07:56 AM

we will gladly help you to fix those problems, just show us what you have tried (and what went wrong)

genderbender 03-20-2013 09:07 AM

Only one time is echoed when it should read line by line searching for that specific time:
Quote:

awk ' /task_id.*cmd/ { print $3; exit } ' /var/log/tacacs.log
I've never used set, but the commands after set return nothing...
Quote:

set -- `grep $TIME $LOG | grep cmd | tail -n 1`
Shall I supply some code to go with this? Thanks for your help by the way?

pan64 03-20-2013 09:19 AM

yes, that awk will stop at the first line. Probably that is not what you want. You can try this:
awk ' /task_id.*cmd/ { print $3 } ' /var/log/tacacs.log | uniq
or even better you can implement uniq in awk:
awk ' /task_id.*cmd/ { times[$3] } END { for ( key in times ) { print times[key] "\n" } ' /var/log/tacacs.log

set will not return anything but set $1, $2 .... $7 for you. So after that line you can use the variables $1, $2 ... as month, day...

sundialsvcs 03-20-2013 09:30 AM

Good grief... what people can manage to do with a shell-script! :eek:

Use a real programming-language, designed for this purpose. There are lots to choose from. Even PHP can be used for scripting.

The first line of your script, the so-called #!shebang, which will specify what command-processor should be used to execute it. (Do you, say, know PHP? Then, use that. You can do that, you know ...)

Your script is built in an incomprehensible inefficient way, launching an instance of the mysql process, with :eek: unlimited :eek: access to the database, to insert every single line.

No wonder your computer gets mad at you. I'm surprised it hasn't removed itself from the rack and skipped town. ;)

genderbender 03-20-2013 09:40 AM

To quote wikipedia:

"PHP is a server-side scripting language designed for Web development". This isn't web development, just needs to read from a text file and put the contents in a database. Perl was an option, but bash was quicker for me. There were options to write directly to a database which I chose not to do when implementing tacacs, I also chose (possibly unwisely) to use root as my user, but then again there's just one database on the system, not multiple so I don't have much concern. For all your rubbishing you've been zero help, perhaps positivity and ideas is a better idea than going "why not just write it in something else". PHP isn't exactly a good programming language anyway, that being said I wouldn't dream of going to a php forum and going "why not use C++ instead or an OO language you fools". I'd also like to add - this server has a small footprint and doesn't have php installed... I reckon calling php would use more memory than bash (I might be wrong here though).

grail 03-20-2013 10:13 AM

Actually the uniq idea in awk can be a little simpler:
Code:

awk '/task_id.*cmd/ && ! _[$3]++{print $3}' /var/log/tacacs.log

genderbender 03-20-2013 11:40 AM

Quote:

Originally Posted by grail (Post 4915256)
Actually the uniq idea in awk can be a little simpler:
Code:

awk '/task_id.*cmd/ && ! _[$3]++{print $3}' /var/log/tacacs.log

I'm getting some quote odd results now, some of the fields do not complete when using this one liner... Any ideas? I think the search for task_id is not working or something and results are returned where nothing has been actioned. E.g I've got results such as:

Wed Mar 20 16:37:56 2013 [1234]: connect from 123.123.123.123 [123.123.123.123]

theNbomr 03-20-2013 11:55 AM

So far only sundialsvcs has seized upon the correct answer. The original question reads like: 'I am building a house using a butterknife, a large flat rock and a couple of knitting needles, and it seems to be taking forever. How can I speed up the process?'
Using the correct tools for the job is the essential first step in optimization. A compiled application with a properly designed database and database access methods seems to be a much better approach. Even a fast scripting language (singular) like Perl with a good library for DBMS access will probably result in adequate speed and CPU usage without doing any database optimization.

--- rod.


All times are GMT -5. The time now is 08:20 PM.