LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Bash Scripting - Output as Multiple Files (https://www.linuxquestions.org/questions/programming-9/bash-scripting-output-as-multiple-files-858814/)

zer0signal 01-26-2011 11:24 AM

Bash Scripting - Output as Multiple Files
 
Ok here is what I am trying to do. I have wrote a 1 line command that parses a file, locates the IP Address in the file and then trims the output the way I want it, and then sorts numerically and by uniqueness and then >> appends to output.txt

I can get all the IP's into 1 file "output.txt", but what I am really looking for is some type of way to create a text file, for each IP it finds labeled xxx.xxx.xxx.xxx.txt and also put that ip address into that file..

xxx.xxx.xxx.xxx = the ip address it finds

Can anyone offer suggestions on the best approach for this...?

Thanks

colucix 01-26-2011 11:38 AM

If you have the IP stored in a variable (using command substitution) you can simply do
Code:

echo $ip > $ip.txt
What is the command line you mentioned? If using awk it can be even more straightforward.

zer0signal 01-26-2011 11:43 AM

Quote:

What is the command line you mentioned? If using awk it can be even more straightforward.
I am using awk to specify my separator.

cat /var/log/secure |grep "Failed password" | awk -F'from' '{ print $2 } ' | cut -d" " -f2 | sort -n -u > list.txt

colucix 01-26-2011 12:26 PM

awk has all the grep and cut functionality, so that your line can be condensed into:
Code:

awk '
/Failed password/ {
  ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")
  if ( ! _[ip]++ ) { 
    print ip > (ip ".txt")
    print ip
  }
}' /var/log/secure | sort -n > list.txt

This will create single IP files and will write the whole list of IPs (without duplicates) into list.txt. Feel free to ask for explanation if something is not clear.

zer0signal 01-26-2011 12:46 PM

uh.... ok? lol I really don't know much awk, and I'm just learning to script. If you would not mind being able to break that down, so I can understand the logic and flow. If not I can research the web on what is actually being stated there. I mean I am able to break down some of it to understand. =)


awk ' <-- that seems to start the awk statement

/Failed password/ { <-- parsing ID

ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g") <-- specify the variable and next deliemiter... dont know what the rest of the line is.

if ( ! _[ip]++ ) { <- if statement for the variable ip

print ip > (ip ".txt") <-- print each variable with to a txt with the varable as file name

print ip <-- ?
}
}' /var/log/secure | sort -n > list.txt <-- dumping secure log to list.txt file?


sorry looks really sloppy

colucix 01-26-2011 02:22 PM

Well.. here is my explanation: an awk rule is made of
Code:

pattern { action }
In my example, we have a single rule whose pattern is /Failed password/. This means that the action (that is the code inside brackets) is executed only for those line matching the regular expression. This accomplishes the task of grep in your code.

First we have to extract the IP address from the line. I don't know what the line exactly is in your secure file, but I can guess based on your code. The gensub function can do substitutions in a string. Here we want to ignore all the parts of the string but the IP address:
Code:

ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")
The regular expression matches any string followed by from and a space, then matches any string made of numbers and dots (a rough expression to match IP addresses) and finally a space followed by any other string. Note the parentheses around the part matching the IP address: their purpose is to keep in memory the matched string. Now the whole line can be substituted by the matching IP address using "\\1" as substitution string, where 1 means the first part of the string kept in memory. Indeed the regular expression might have multiple parentheses to retain different parts of the string, so that we can use \\2 or \\3 as well. Please, refer to the GNU awk manual here for more details.

Now we have extracted the IP address with a reasonable confidence and we want either to write it into a file (named as the IP address itself) and to add it to a complete list. Since we'll use shell redirection later, we send it to standard output. The first task is accomplished by:
Code:

print ip > (ip ".txt")
where the file name is simply the concatenation inside parentheses of the content of the ip variable and the string .txt. The second task is even more simple:
Code:

print ip
However we want to avoid duplicates, i.e. the complete list will not contains the same address twice or more and the .txt files will not be written multiple times. We want awk to print the ip variable only the first time it contains that particular IP address.

First take in mind that in awk true is any number different from 0 or any non-empty string, whereas false is 0 or the null string. Here
Code:

_[ip]++
an array element (the name of the array is an underscore for brevity) whose index is the current ip address, is incremented by one. Note the C notation ++ after the variable name. It means that the variable is evaluated as is and then incremented by one. The opposite would have been
Code:

++_[ip]
where first the variable is incremented and then evaluated. This is a subtle difference that let we evaluate 0 (false) the first time we assign the ip-th element of the array, any other number (true) the subsequent times. It's difficult to explain this piece of code, but I hope it's a little more clear.

However we want a true condition only the first time the IP is encountered in order to print it. Hence we have to invert the logical expression using the not operator (in awk is an exclamation mark):
Code:

if ( ! _[ip]++ )
and the trick is done! :)

Finally, following your code we want to sort numerically the output (note that we already managed for duplicates) and write it to the list.txt file:
Code:

... | sort -n > list.txt

zer0signal 01-26-2011 03:16 PM

Quote:

_[ip]++
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

corp769 01-26-2011 03:31 PM

Quote:

Originally Posted by zer0signal (Post 4238672)
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

Yes. Kind of like in C, var1 = var1++

colucix 01-26-2011 03:41 PM

Quote:

Originally Posted by zer0signal (Post 4238672)
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?

Nope. It will assign 1, 2, 3 and so on to the array element with index ip
Code:

_[ip]
A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.
Code:

_[ip] = 0
_[ip] = 1
_[ip] = 2

Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:
Code:

_[192.168.0.1] = 0
After that the ++ notation increments its value by one. The next time you encounter the same IP address you have
Code:

_[192.168.0.1] = 1
and then again it's incremented by one, and so on.

zer0signal 01-26-2011 04:27 PM

Ok, that makes it a lot clearer. I appreciate your advice and explanation on this subject! =) I have been going over the GNU-awk page. Defiantly something I'm going to take deeper, because of its ability with text parsing and output!

Thanks again! =)

corp769 01-26-2011 04:30 PM

Quote:

Originally Posted by colucix (Post 4238699)
Nope. It will assign 1, 2, 3 and so on to the array element with index ip
Code:

_[ip]
A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.
Code:

_[ip] = 0
_[ip] = 1
_[ip] = 2

Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:
Code:

_[192.168.0.1] = 0
After that the ++ notation increments its value by one. The next time you encounter the same IP address you have
Code:

_[192.168.0.1] = 1
and then again it's incremented by one, and so on.

My fault, I knew what to say, just didn't say it correctly. I do that a lot :P

zer0signal 01-26-2011 04:33 PM

Ha! =) It's cool, just trying to grasp the concept =)

corp769 01-26-2011 04:37 PM

Do you fully understand it now?

zer0signal 01-26-2011 06:12 PM

Quote:

Originally Posted by corp769 (Post 4238757)
Do you fully understand it now?

For the most part I can cipher through what is actually going on. Will I able to pull this out of thin air next I have to code something like this.. No, lol but that to be expected with starting off. But I understand what is actually going on. =)

corp769 01-26-2011 06:43 PM

That's good. It took me a while to get awk and gawk down to a science....


All times are GMT -5. The time now is 12:02 PM.