ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Rep:
extracting more than one value from a string
hello coders,
I often write scripts (normally in bash) which i use to extract particular patterns from a string... In my applications I've been always extracting ONE value from a string, or multiple SIMPLE values. Say for example i have this string:
hello:xxx:world:123
and i want to extract "hello world". what i would do is:
echo "hello:xxx:world:123" | awk {'print $1 $3'}
The problem is what i do i have to do to extract "hello world" from:
hello:xxx:"world":123
(Note dat if i use the previous expression i would obtain hello "world" instead of hello world).
Of course i'm only depicting a simple example. I shall be using this for a log file which generates a line such as:
As you can see I have extracted the username part by piping two consecutive "awk"'s which is quite cumbersome, and also, the other two parameters (Start/Stop, and Ip-address) cannot be obtained this way. Is this an impossible problem to solve, or requires some rocket-science???
Why do you have to get both at once? Why don't you get the user name first and then get the IP address?
If you really do need to do both at once then pass the username followed by the whole input line again to another awk, you'll know that the username is $1 in the second awk so you just need to find the IP address.
e.g. the file TEST contains hello:xxx:"world":123
awk -F: '{printf "%s\"%s\n", $1,$0}' TEST | awk -F\" '{print $1,$3}'
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Original Poster
Rep:
The reason why i need to get everything at once because this is a logfile which grows and which i need to process it in a quasi-real-time environment. Let me describe the whole problem so that a guy with a similar problem can also know what I'm talking about. First of all I'm dealing with a RADIUS server... My aim is to experiment a bit with it's start/stop records. When a user connects to the radius server, a "start" record is generated. When a user disconnects, a "stop" record is generated. (This is the local campus radius server by the way). These start/stop records are written in the same logfile - radacct.log, and a typical log is this:
This logfile, like all other logfiles, grows - as soon as a user connects/disconnects to the server, another entry is appended (an entry is made up of a chunk of paragraph) . My aim is that, in as a realtime scenario as possible, as soon as another entry is appended, i will know his username, his ip-address and if he connected (denoted by Start) or disconnected (denoted by Stop). I will then produce the message on the screen or some file - "Username xyz with ip 1.2.3.4 has connected/disconnected".
My first attempt was to parse the so-mentioned logfile in such a way that each paragraph is converted into a one-line string (refer to the thread http://www.linuxquestions.org/questi...hreadid=122211). From there I tried to obtain my needed parameters.
The bad news is that i cannot obtain the parameters needed !!
am i taking the wrong approach of solving this problem? At a first glance i thought this was going to be a very easy thing to do!
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Original Poster
Rep:
to continue with the previous post, i only managed to get the parameters for the last entry.... i used a flow like this:
./parse.bash | tail -n 1 | ./process.bash
where parse.bash will convert the paragraphs into one-line strings,
process.bash will extract the necessary parameters...
But this is not really what i want since the program quits immediately as i get the parameters of the last start/stop record - I want something like "tail -f" which waits until a new entry is available!! But using "tail -f" instead of "tail -n 1" didn't work, giving me no output !
But i guess that the problem can be still be solved. I'm just not thinking in the right way, but the solution exists!!!
If I remember from your previous thread, you want to control the size of the log file.
There's an odd little command called "tac" which is "cat" backwards and does what you'd probably expect it to - catenates files in reverse. You could "tac" your logfile and count back a certain number of paragraphs, save the output to a temporary file and "tac" that again to overwrite your logfile. Though I guess you might be worried about a new entry being written in the time it took to do that. Do you administer this RADIUS server? There's bound to be something in the documentation about limiting the size of the logs.
I've no experience in that kind of thing, so I'm just throwing out suggestions.
Did you try my suggestion about the awk command? You can also use multiple field delimiters. For your specific case it would be something like this (but check for syntax!)
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Original Poster
Rep:
Hi jkobrien, thanks for your help I have managed to go that far to read the last line in the log file, but when i try to replace tail -1 by tail -f, it outputs nothing... that was my real concern. As regards to the "tac" i'll try it out as well...
ok, so the problem is actually how to keep an eye on the logfile and as soon as a new entry appears spit out some data.
I think we've sorted out the question of returning the values so it's probably worth your while posting a new thread on how to monitor logfiles. That should attract more knowledgeable replies
It seems as if your problem is not now with the extraction, but rather with interacting with a growing logfile. I can see several strategies.
1) If output is through the standard syslogd, there is an option to send output to a named piped (from man syslogd):
OUTPUT TO NAMED PIPES (FIFOs)
This version of syslogd has support for logging output to named pipes
(fifos). A fifo or named pipe can be used as a destination for log
messages by prepending a pipy symbol (``|'') to the name of the file.
This is handy for debugging. Note that the fifo must be created with
the mkfifo command before syslogd is started.
The following configuration file routes debug messages from the
kernel to a fifo:
# Sample configuration to route kernel debugging
# messages ONLY to /usr/adm/debug which is a
# named pipe.
kern.=debug |/usr/adm/debug
However, from the logfile format, this does not appear to be the case.
2) I've seen some programs whose log is sent to standard output. Log files are created with a pipe. In otherwords: "someprog" sends output to terminal, "someprog > logfile" sends output to logfile, "someprog > /dev/null" suppreses log. Therefore, you could you something like "someprog |tee logfile|parse.bash" to pipe the output both to the logfile and to your parsing program.
3) A daemon. Simply run your script in the background, have it check periodically (every second, every 5 minutes, whatever suits your fancy) for a change in the log file, if changed, begin processing on saved offset.
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Original Poster
Rep:
the whole problem arises because in Linux, putting a command like the following:
tail -f file1.txt | ./parse.bash
will produce some output to the screen,
whereas, strangely enough, modifying the command to:
tail -f file1.txt | ./parse.bash > buffer
, will not write any data to the file "buffer"!!! this means that the flow of data stops at the script..
parse.bash contains nothing more than this command:
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}'
Can someone explain me what's going on and why is the output not piped to a file, or to anothe script???
Distribution: RH 7.3/8.0/9.0, Debian Stable 3.0, FreeBSD 5.2, Solaris 8/9/10,HP-UX
Posts: 340
Original Poster
Rep:
Hi jkobrien,
As already stated, your line:
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }' works perfectly well. the only downside is that I cannot redirect the output to a text file...
I need to redirect the output so that I can use another script to read from this file and do the final processing... The last thing which can pop to my mind is to try to do the final processing at the same line, but I need to know a technique:
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }' displays two values at the screen, denoted by $4 and $23. Can I store these two values at the same command line? This is what i really need because my ultimate processing must be done on these variables. As explained, I tried to redirect the output of these variables to another file so I can awk them and process them from there, but this was not possible
I didn't know that this little application can be so hard!!
I'm afraid, this works perfectly for me. Using the text from one of your mails above, I get "lungaro 17.15.3.139" in the output or redirected to file. I tried it in both tcsh and bash. Do you use a different shell?
Are you trying this command in isolation on the command line or as part of a longer shell script? Could there be something further down the script that overwrites your log file?
One minor thing (unconnected to your problem), you don't need the input redirect, "<", before "detail".
Yeah, wasn't that explained in the thread on monitoring logfiles that you started?
tail -f seems to be a non-starter for what you want to do.
It seems to me that the 3rd suggestion above from fsbooks is the way to go.
1)Reformat your logfile with
($name,$IP_addr) = `awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}'
2)Save it somewhere
3)After some interval repeat 1) and 2)
4)Compare the last lines of the current logfile with the last lines of the previous logfile.
5)If they're the same do nothing, ie. return to step 3)
6)If they're different, process the new line with
awk -F '[\",=]' '{print $4 ,$23 }'
7)Overwrite the previous logfile with the current one.
8)Return to step 3).
Obviously if there were more than one new entries, your script would have to be sophisticated enough to read back far enough to get them all, but that shouldn't present major problems.
Come to think of it, you could just use
ls -l <logfile> | awk '{print $6, $7, $8}'
to get the last modification time of the logfile. If that's newer than the last time you checked you have new entries. Though you still have to check back to the last "old" entry, so maybe that approach doesn't save you much.
Or, you could forget about the above approach altogether, just read down through the file checking the first entry of each paragraph (the time of the log entry). If that's less than the time of your last check, continue down towards the end. If you hit an entry that's more recent than your previous check, stop and start looking for the user name (is it always the second entry? Can you rely on the string "User-name" being in it?). When you've found that, start looking for the IP address (again, is that always the 6th entry? Can you rely on the text "NAS-IP-Address"? Is it always after the user-name entry?). Once you've found that start looking for new time entries, or the end of the file, again.
Your main issue with this will be how quick is it? To get quasi-realtime response, you need to have this script running more often than you're likely to get signals. Once you got the basic script working you could possibly speed it up by just looking that the end of the logfile with tac or tail.
Sorry, I've gone on for longer than I intended. I hope the above isn't too garbled and confusing.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.