[SOLVED] Another newb Awk Q

Linux_Kidd · 04-19-2012, 07:53 AM

Quote:

Originally Posted by grail

You don't happen to work in a bank by chance? I had a very similar experience where a process could only be run over the weekend as it took almost 40hrs to complete a single run. After I had been
there 3 months we were able to run it adhoc whenever we liked as it took about 3 minutes

not a bank. i am a security consultant, my client is a city retirement system. i have discovered lots of processes that are human heavy and should be automated. in this case this was a task being done by Infosec group, getting audit reports for mainframe. the reports (txt files) were being manually stripped of needed data, then copied into an excel sheet which is later imported into a access db.

so, i have two version of my script, the 1st produces some wacky output on the "last" part, seems to repeat the same data as if "last" doesnt get updated on next line read. the input doesnt have more than 12 fields, so in the 2nd script i accommodate up to 13, but the beauty of the "for" loop is i wouldnt need to worry about how many fields there are, etc.

bad

Code:

#!/bin/bash
awk '
BEGIN {
OFS="|";
}
{
        if ( NF == 0 || $1 ~ /^(TOP|-|+|=|0$|1\/|\/\/|PASSWORD|1E)/ ) {}
        else {
        for (i=6; i<=NF; i++) {
        last = last FS $i;      }
        print $1,$2,$3,$4,$5,last;}
} ' | sed 's/^0\(.*\)/\1/'

good

Code:

#!/bin/bash
awk '
BEGIN {
OFS="|";
}
{
        if ( NF == 0 || $1 ~ /^(TOP|-|+|=|0$|1\/|\/\/|PASSWORD|1E)/ ) {}
        else {
        last = $6FS$7FS$8FS$9FS$10FS$11FS$12FS$13;
        print $1,$2,$3,$4,$5,last;}
} ' | sed 's/^0\(.*\)/\1/'

grail · 04-19-2012, 08:37 AM

Do you look at the options I suggested around the fifth field? Also your single items in the regex could just go in square brackets, ie. [-+=].

Also, I notice you call the awk from within a bash script, unless there is more being done by the script you can just as easily make it an awk script, interpreter is simply:

Code:

#!/usr/bin/awk -f

Of course alter if your path to awk is elsewhere.

Linux_Kidd · 04-19-2012, 09:08 AM

yes, there will be more bash stuff in there. i will eventually pipe sed out to a file "$HOME/out.$2.txt" ,etc.
i am not sure what you mean by unique 5th field. all the fields can be unique.

grail · 04-19-2012, 09:34 AM

Let me see if an example helps. I will only use 5 items total but third is the one to watch for:

The following two examples can be dealt with:

Code:

one two three four two
one two three four three

So the first line shows that the third field is unique from all others and the second shows it to be the first occurrence of the value so we could use:

Code:

awk '{print gensub(".*"$3" ","","1")}' file

So this could accommodate the last variable.

Where this does not work is something like:

Code:

three two three four five

Hope that helps explain what I was meaning.

Linux_Kidd · 04-19-2012, 10:20 AM

hmmm, i cannot verify that the field would be unique. in this case i really only care about the "fields" and not what's in them, etc.

btw, my gawk is gnu v3.1.5

this has been a quick learning exercise, and a quick analysis on cost savings looks like this. thnx all.

(salary #’s are for example only, the other #’s are real)

Code:

John Doe salary
80k/yr
$38.46/hr = $0.0107/sec

181 txt files (9 months worth) processed manually = estimated 40 man hrs = $1538.40/9mo = $170.93 per one months worth of data processing.

The script processed the same 181 files in 6sec = $0.0642/9mo = $0.007133 per one months worth of data processing.

However, we need to account for the time spent developing/testing the script in the total cost evaluation/analysis.

We will estimate the person who can script has hourly rate 1.25x that of the person processing the data, so that’s 80k * 1.25 = 100k/yr = $48.08/hr

A guru scriptor can develop/test this script probably in under 2hrs, but for me I took longer, 4hrs. So development/testing cost is $192.32


So, yearly analysis:

•	Manually processing the files = $170.93 * 12 = $2051.16/yr
•	Automated script = $192.32 + ($0.007133 * 12) = $192.41/yr for the 1st year, years 2+ = $0.085/yr
•	that's a 966% savings year-1, and a savings of almost 100% saving $2051.075/yr for years 2+

Conclusion - automate where possible, it saves lots of $$.