[SOLVED] Bash Scripting

zer0signal · 01-26-2011, 11:24 AM

Ok here is what I am trying to do. I have wrote a 1 line command that parses a file, locates the IP Address in the file and then trims the output the way I want it, and then sorts numerically and by uniqueness and then >> appends to output.txt

I can get all the IP's into 1 file "output.txt", but what I am really looking for is some type of way to create a text file, for each IP it finds labeled xxx.xxx.xxx.xxx.txt and also put that ip address into that file..

xxx.xxx.xxx.xxx = the ip address it finds

Can anyone offer suggestions on the best approach for this...?

Thanks

colucix · 01-26-2011, 11:38 AM

If you have the IP stored in a variable (using command substitution) you can simply do

Code:

echo $ip > $ip.txt

What is the command line you mentioned? If using awk it can be even more straightforward.

zer0signal · 01-26-2011, 11:43 AM

Quote:

What is the command line you mentioned? If using awk it can be even more straightforward.

I am using awk to specify my separator.

cat /var/log/secure |grep "Failed password" | awk -F'from' '{ print $2 } ' | cut -d" " -f2 | sort -n -u > list.txt

colucix · 01-26-2011, 12:26 PM

awk has all the grep and cut functionality, so that your line can be condensed into:

Code:

awk '
/Failed password/ {
  ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")
  if ( ! _[ip]++ ) {  
    print ip > (ip ".txt")
    print ip
  }
}' /var/log/secure | sort -n > list.txt

This will create single IP files and will write the whole list of IPs (without duplicates) into list.txt. Feel free to ask for explanation if something is not clear.

zer0signal · 01-26-2011, 12:46 PM

uh.... ok? lol I really don't know much awk, and I'm just learning to script. If you would not mind being able to break that down, so I can understand the logic and flow. If not I can research the web on what is actually being stated there. I mean I am able to break down some of it to understand. =)

awk ' <-- that seems to start the awk statement

/Failed password/ { <-- parsing ID

ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g") <-- specify the variable and next deliemiter... dont know what the rest of the line is.

if ( ! _[ip]++ ) { <- if statement for the variable ip

print ip > (ip ".txt") <-- print each variable with to a txt with the varable as file name

print ip <-- ?
}
}' /var/log/secure | sort -n > list.txt <-- dumping secure log to list.txt file?

sorry looks really sloppy

colucix · 01-26-2011, 02:22 PM

Well.. here is my explanation: an awk rule is made of

Code:

pattern { action }

In my example, we have a single rule whose pattern is /Failed password/. This means that the action (that is the code inside brackets) is executed only for those line matching the regular expression. This accomplishes the task of grep in your code.

First we have to extract the IP address from the line. I don't know what the line exactly is in your secure file, but I can guess based on your code. The gensub function can do substitutions in a string. Here we want to ignore all the parts of the string but the IP address:

Code:

ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")

The regular expression matches any string followed by from and a space, then matches any string made of numbers and dots (a rough expression to match IP addresses) and finally a space followed by any other string. Note the parentheses around the part matching the IP address: their purpose is to keep in memory the matched string. Now the whole line can be substituted by the matching IP address using "\\1" as substitution string, where 1 means the first part of the string kept in memory. Indeed the regular expression might have multiple parentheses to retain different parts of the string, so that we can use \\2 or \\3 as well. Please, refer to the GNU awk manual here for more details.

Now we have extracted the IP address with a reasonable confidence and we want either to write it into a file (named as the IP address itself) and to add it to a complete list. Since we'll use shell redirection later, we send it to standard output. The first task is accomplished by:

Code:

print ip > (ip ".txt")

where the file name is simply the concatenation inside parentheses of the content of the ip variable and the string .txt. The second task is even more simple:

Code:

print ip

However we want to avoid duplicates, i.e. the complete list will not contains the same address twice or more and the .txt files will not be written multiple times. We want awk to print the ip variable only the first time it contains that particular IP address.

First take in mind that in awk true is any number different from 0 or any non-empty string, whereas false is 0 or the null string. Here

Code:

_[ip]++

an array element (the name of the array is an underscore for brevity) whose index is the current ip address, is incremented by one. Note the C notation ++ after the variable name. It means that the variable is evaluated as is and then incremented by one. The opposite would have been

Code:

++_[ip]

where first the variable is incremented and then evaluated. This is a subtle difference that let we evaluate 0 (false) the first time we assign the ip-th element of the array, any other number (true) the subsequent times. It's difficult to explain this piece of code, but I hope it's a little more clear.

However we want a true condition only the first time the IP is encountered in order to print it. Hence we have to invert the logical expression using the not operator (in awk is an exclamation mark):

Code:

if ( ! _[ip]++ )

and the trick is done!

Finally, following your code we want to sort numerically the output (note that we already managed for duplicates) and write it to the list.txt file:

Code:

... | sort -n > list.txt

zer0signal · 01-26-2011, 03:16 PM

Quote:

_[ip]++

So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

corp769 · 01-26-2011, 03:31 PM

Quote:

Originally Posted by zer0signal

So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

Yes. Kind of like in C, var1 = var1++

colucix · 01-26-2011, 03:41 PM

Quote:

Originally Posted by zer0signal

So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

Nope. It will assign 1, 2, 3 and so on to the array element with index ip

Code:

_[ip]

A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.

Code:

_[ip] = 0
_[ip] = 1
_[ip] = 2

Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:

Code:

_[192.168.0.1] = 0

After that the ++ notation increments its value by one. The next time you encounter the same IP address you have

Code:

_[192.168.0.1] = 1

and then again it's incremented by one, and so on.

zer0signal · 01-26-2011, 04:27 PM

Ok, that makes it a lot clearer. I appreciate your advice and explanation on this subject! =) I have been going over the GNU-awk page. Defiantly something I'm going to take deeper, because of its ability with text parsing and output!

Thanks again! =)

corp769 · 01-26-2011, 04:30 PM

Quote:

Originally Posted by colucix

Nope. It will assign 1, 2, 3 and so on to the array element with index ip

Code:

_[ip]

A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.

Code:

_[ip] = 0
_[ip] = 1
_[ip] = 2

Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:

Code:

_[192.168.0.1] = 0

After that the ++ notation increments its value by one. The next time you encounter the same IP address you have

Code:

_[192.168.0.1] = 1

and then again it's incremented by one, and so on.

My fault, I knew what to say, just didn't say it correctly. I do that a lot :P

zer0signal · 01-26-2011, 04:33 PM

Ha! =) It's cool, just trying to grasp the concept =)

corp769 · 01-26-2011, 04:37 PM

Do you fully understand it now?

zer0signal · 01-26-2011, 06:12 PM

Quote:

Originally Posted by corp769

Do you fully understand it now?

For the most part I can cipher through what is actually going on. Will I able to pull this out of thin air next I have to code something like this.. No, lol but that to be expected with starting off. But I understand what is actually going on. =)

corp769 · 01-26-2011, 06:43 PM

That's good. It took me a while to get awk and gawk down to a science....