[SOLVED] Bash Scripting - Output as Multiple Files
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Ok here is what I am trying to do. I have wrote a 1 line command that parses a file, locates the IP Address in the file and then trims the output the way I want it, and then sorts numerically and by uniqueness and then >> appends to output.txt
I can get all the IP's into 1 file "output.txt", but what I am really looking for is some type of way to create a text file, for each IP it finds labeled xxx.xxx.xxx.xxx.txt and also put that ip address into that file..
xxx.xxx.xxx.xxx = the ip address it finds
Can anyone offer suggestions on the best approach for this...?
Thanks
Click here to see the post LQ members have rated as the most helpful post in this thread.
awk has all the grep and cut functionality, so that your line can be condensed into:
Code:
awk '
/Failed password/ {
ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")
if ( ! _[ip]++ ) {
print ip > (ip ".txt")
print ip
}
}' /var/log/secure | sort -n > list.txt
This will create single IP files and will write the whole list of IPs (without duplicates) into list.txt. Feel free to ask for explanation if something is not clear.
uh.... ok? lol I really don't know much awk, and I'm just learning to script. If you would not mind being able to break that down, so I can understand the logic and flow. If not I can research the web on what is actually being stated there. I mean I am able to break down some of it to understand. =)
awk ' <-- that seems to start the awk statement
/Failed password/ { <-- parsing ID
ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g") <-- specify the variable and next deliemiter... dont know what the rest of the line is.
if ( ! _[ip]++ ) { <- if statement for the variable ip
print ip > (ip ".txt") <-- print each variable with to a txt with the varable as file name
print ip <-- ? } }' /var/log/secure | sort -n > list.txt <-- dumping secure log to list.txt file?
sorry looks really sloppy
Last edited by zer0signal; 01-26-2011 at 01:06 PM.
Well.. here is my explanation: an awk rule is made of
Code:
pattern { action }
In my example, we have a single rule whose pattern is /Failed password/. This means that the action (that is the code inside brackets) is executed only for those line matching the regular expression. This accomplishes the task of grep in your code.
First we have to extract the IP address from the line. I don't know what the line exactly is in your secure file, but I can guess based on your code. The gensub function can do substitutions in a string. Here we want to ignore all the parts of the string but the IP address:
Code:
ip = gensub(/.*from ([[:digit:].]+) .*/,"\\1","g")
The regular expression matches any string followed by from and a space, then matches any string made of numbers and dots (a rough expression to match IP addresses) and finally a space followed by any other string. Note the parentheses around the part matching the IP address: their purpose is to keep in memory the matched string. Now the whole line can be substituted by the matching IP address using "\\1" as substitution string, where 1 means the first part of the string kept in memory. Indeed the regular expression might have multiple parentheses to retain different parts of the string, so that we can use \\2 or \\3 as well. Please, refer to the GNU awk manual here for more details.
Now we have extracted the IP address with a reasonable confidence and we want either to write it into a file (named as the IP address itself) and to add it to a complete list. Since we'll use shell redirection later, we send it to standard output. The first task is accomplished by:
Code:
print ip > (ip ".txt")
where the file name is simply the concatenation inside parentheses of the content of the ip variable and the string .txt. The second task is even more simple:
Code:
print ip
However we want to avoid duplicates, i.e. the complete list will not contains the same address twice or more and the .txt files will not be written multiple times. We want awk to print the ip variable only the first time it contains that particular IP address.
First take in mind that in awk true is any number different from 0 or any non-empty string, whereas false is 0 or the null string. Here
Code:
_[ip]++
an array element (the name of the array is an underscore for brevity) whose index is the current ip address, is incremented by one. Note the C notation ++ after the variable name. It means that the variable is evaluated as is and then incremented by one. The opposite would have been
Code:
++_[ip]
where first the variable is incremented and then evaluated. This is a subtle difference that let we evaluate 0 (false) the first time we assign the ip-th element of the array, any other number (true) the subsequent times. It's difficult to explain this piece of code, but I hope it's a little more clear.
However we want a true condition only the first time the IP is encountered in order to print it. Hence we have to invert the logical expression using the not operator (in awk is an exclamation mark):
Code:
if ( ! _[ip]++ )
and the trick is done!
Finally, following your code we want to sort numerically the output (note that we already managed for duplicates) and write it to the list.txt file:
Code:
... | sort -n > list.txt
Last edited by colucix; 01-26-2011 at 02:27 PM.
Reason: spelling corrected (hopefully)
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?
So this is creating a scaling variable based on how many IPs it finds... So variable would be ip1,ip2,ip3,ip4,ip5,ip6 etc. Until there is no more data for the variables?
Nope. It will assign 1, 2, 3 and so on to the array element with index ip
Code:
_[ip]
A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.
Code:
_[ip] = 0
_[ip] = 1
_[ip] = 2
Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:
Code:
_[192.168.0.1] = 0
After that the ++ notation increments its value by one. The next time you encounter the same IP address you have
Code:
_[192.168.0.1] = 1
and then again it's incremented by one, and so on.
Ok, that makes it a lot clearer. I appreciate your advice and explanation on this subject! =) I have been going over the GNU-awk page. Defiantly something I'm going to take deeper, because of its ability with text parsing and output!
Nope. It will assign 1, 2, 3 and so on to the array element with index ip
Code:
_[ip]
A uninitialized variable in awk has value 0, so that the first time the element is evaluated it returns 0. The second time it is evaluated, it returns 1 since it has been incremented in the previous pass.
Code:
_[ip] = 0
_[ip] = 1
_[ip] = 2
Take in mind that in awk the array index can be any string (not only a number). Suppose you read the IP 192.168.0.1. The first time you evaluate the IPth element you have:
Code:
_[192.168.0.1] = 0
After that the ++ notation increments its value by one. The next time you encounter the same IP address you have
Code:
_[192.168.0.1] = 1
and then again it's incremented by one, and so on.
My fault, I knew what to say, just didn't say it correctly. I do that a lot :P
For the most part I can cipher through what is actually going on. Will I able to pull this out of thin air next I have to code something like this.. No, lol but that to be expected with starting off. But I understand what is actually going on. =)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.