ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
You don't happen to work in a bank by chance? I had a very similar experience where a process could only be run over the weekend as it took almost 40hrs to complete a single run. After I had been
there 3 months we were able to run it adhoc whenever we liked as it took about 3 minutes
not a bank. i am a security consultant, my client is a city retirement system. i have discovered lots of processes that are human heavy and should be automated. in this case this was a task being done by Infosec group, getting audit reports for mainframe. the reports (txt files) were being manually stripped of needed data, then copied into an excel sheet which is later imported into a access db.
so, i have two version of my script, the 1st produces some wacky output on the "last" part, seems to repeat the same data as if "last" doesnt get updated on next line read. the input doesnt have more than 12 fields, so in the 2nd script i accommodate up to 13, but the beauty of the "for" loop is i wouldnt need to worry about how many fields there are, etc.
bad
Code:
#!/bin/bash
awk '
BEGIN {
OFS="|";
}
{
if ( NF == 0 || $1 ~ /^(TOP|-|+|=|0$|1\/|\/\/|PASSWORD|1E)/ ) {}
else {
for (i=6; i<=NF; i++) {
last = last FS $i; }
print $1,$2,$3,$4,$5,last;}
} ' | sed 's/^0\(.*\)/\1/'
good
Code:
#!/bin/bash
awk '
BEGIN {
OFS="|";
}
{
if ( NF == 0 || $1 ~ /^(TOP|-|+|=|0$|1\/|\/\/|PASSWORD|1E)/ ) {}
else {
last = $6FS$7FS$8FS$9FS$10FS$11FS$12FS$13;
print $1,$2,$3,$4,$5,last;}
} ' | sed 's/^0\(.*\)/\1/'
Do you look at the options I suggested around the fifth field? Also your single items in the regex could just go in square brackets, ie. [-+=].
Also, I notice you call the awk from within a bash script, unless there is more being done by the script you can just as easily make it an awk script, interpreter is simply:
yes, there will be more bash stuff in there. i will eventually pipe sed out to a file "$HOME/out.$2.txt" ,etc.
i am not sure what you mean by unique 5th field. all the fields can be unique.
Let me see if an example helps. I will only use 5 items total but third is the one to watch for:
The following two examples can be dealt with:
Code:
one two three four two
one two three four three
So the first line shows that the third field is unique from all others and the second shows it to be the first occurrence of the value so we could use:
hmmm, i cannot verify that the field would be unique. in this case i really only care about the "fields" and not what's in them, etc.
btw, my gawk is gnu v3.1.5
this has been a quick learning exercise, and a quick analysis on cost savings looks like this. thnx all.
(salary #’s are for example only, the other #’s are real)
Code:
John Doe salary
80k/yr
$38.46/hr = $0.0107/sec
181 txt files (9 months worth) processed manually = estimated 40 man hrs = $1538.40/9mo = $170.93 per one months worth of data processing.
The script processed the same 181 files in 6sec = $0.0642/9mo = $0.007133 per one months worth of data processing.
However, we need to account for the time spent developing/testing the script in the total cost evaluation/analysis.
We will estimate the person who can script has hourly rate 1.25x that of the person processing the data, so that’s 80k * 1.25 = 100k/yr = $48.08/hr
A guru scriptor can develop/test this script probably in under 2hrs, but for me I took longer, 4hrs. So development/testing cost is $192.32
So, yearly analysis:
• Manually processing the files = $170.93 * 12 = $2051.16/yr
• Automated script = $192.32 + ($0.007133 * 12) = $192.41/yr for the 1st year, years 2+ = $0.085/yr
• that's a 966% savings year-1, and a savings of almost 100% saving $2051.075/yr for years 2+
Conclusion - automate where possible, it saves lots of $$.
Last edited by Linux_Kidd; 04-19-2012 at 10:24 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.