extracting more than one value from a string

ganninu · 12-04-2003, 03:11 AM

hello coders,

I often write scripts (normally in bash) which i use to extract particular patterns from a string... In my applications I've been always extracting ONE value from a string, or multiple SIMPLE values. Say for example i have this string:

hello:xxx:world:123

and i want to extract "hello world". what i would do is:

echo "hello:xxx:world:123" | awk {'print $1 $3'}

The problem is what i do i have to do to extract "hello world" from:

hello:xxx:"world":123

(Note dat if i use the previous expression i would obtain hello "world" instead of hello world).

Of course i'm only depicting a simple example. I shall be using this for a log file which generates a line such as:

Wed Dec 3 11:16:02 2003, User-Name = "zimbabwe", Acct-Delay-Time = 0, Acct-Session-Id = "0993256827", Acct-Status-Type = Start, NAS-IP-Address = 17.15.7.11, Acct-Authentic = RADIUS, Framed-Protocol = PPP, Service-Type = Framed-User, Framed-IP-Address = 1.2.3.4, NAS-Port-Type = Sync, NAS-Port-Id = 29362091, Calling-Station-Id = " 08100400044", Unknown-Attr-102 = "Comtel", Connect-Info = "At.1.8.1.1.1964 148864000 148864000", Timestamp = 1070446562, Request-Authenticator = Verified,

My end-intention is that, for example:
If the Acct-Status-Type is equal to "Start", echo this sentence:

Username "<User-Name>" with IP "<Framed-IP-Address>" has connected

where, taking the above example, <User-Name> is "zimbabwe" and <Framed-IP-Address> is "1.2.3.4"

I have only succeeded to obtain "zimbabwe" by giving this query (quite complex):

awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F"," {'print $2'} | awk -F"\"" {'print $2'}

As you can see I have extracted the username part by piping two consecutive "awk"'s which is quite cumbersome, and also, the other two parameters (Start/Stop, and Ip-address) cannot be obtained this way. Is this an impossible problem to solve, or requires some rocket-science???

thanks for your help,
ganninu

jkobrien · 12-04-2003, 05:30 AM

Hi ganninu,

Why do you have to get both at once? Why don't you get the user name first and then get the IP address?

If you really do need to do both at once then pass the username followed by the whole input line again to another awk, you'll know that the username is $1 in the second awk so you just need to find the IP address.

e.g. the file TEST contains hello:xxx:"world":123
awk -F: '{printf "%s\"%s\n", $1,$0}' TEST | awk -F\" '{print $1,$3}'

John

ganninu · 12-04-2003, 06:02 AM

The reason why i need to get everything at once because this is a logfile which grows and which i need to process it in a quasi-real-time environment. Let me describe the whole problem so that a guy with a similar problem can also know what I'm talking about. First of all I'm dealing with a RADIUS server... My aim is to experiment a bit with it's start/stop records. When a user connects to the radius server, a "start" record is generated. When a user disconnects, a "stop" record is generated. (This is the local campus radius server by the way). These start/stop records are written in the same logfile - radacct.log, and a typical log is this:

Wed Dec 3 11:14:09 2003
User-Name = "lungaro"
Acct-Delay-Time = 0
Acct-Session-Id = "0665680677"
Acct-Status-Type = Stop
NAS-IP-Address = 17.1.97.11
Acct-Authentic = RADIUS
Framed-Protocol = PPP
Service-Type = Framed-User
Framed-IP-Address = 17.15.3.139
NAS-Port-Type = Sync
NAS-Port-Id = 18878020
Calling-Station-Id = " 05200500153"
Unknown-Attr-102 = "Comtel"
Ascend-FR-Direct = FR-Direct-No
Acct-Terminate-Cause = Idle-Timeout
Acct-Session-Time = 3050
Acct-Input-Packets = 1258
Acct-Output-Packets = 1113
Acct-Input-Octets = 227495
Acct-Output-Octets = 308948
Connect-Info = "At.1.5.2.1.3653 148864000 148864000"
Timestamp = 1070446449
Request-Authenticator = Verified

Wed Dec 3 11:14:26 2003
User-Name = "valhalla"
Acct-Delay-Time = 0
Acct-Session-Id = "0677218131"
Acct-Status-Type = Start
NAS-IP-Address = 17.1.97.11
Acct-Authentic = RADIUS
Framed-Protocol = PPP
Service-Type = Framed-User
Framed-IP-Address = 17.15.2.55
NAS-Port-Type = Sync
NAS-Port-Id = 20972086
Calling-Station-Id = " 06100100089"
Unknown-Attr-102 = "Comtel"
Connect-Info = "At.1.6.1.1.567 148864000 148864000"
Timestamp = 1070446466
Request-Authenticator = Verified

Wed Dec 3 11:15:39 2003
User-Name = "zimbabwe"
Acct-Delay-Time = 0
Acct-Session-Id = "0993254738"
Acct-Status-Type = Stop
NAS-IP-Address = 17.1.97.11
Acct-Authentic = RADIUS
Framed-Protocol = PPP
Service-Type = Framed-User
Framed-IP-Address = 17.15.2.157
NAS-Port-Type = Sync
NAS-Port-Id = 29362091
Calling-Station-Id = " 08100400044"
Unknown-Attr-102 = "Comtel"
Ascend-FR-Direct = FR-Direct-No
Acct-Terminate-Cause = User-Request
Acct-Session-Time = 40564
Acct-Input-Packets = 1168419
Acct-Output-Packets = 833078
Acct-Input-Octets = 342410027
Acct-Output-Octets = 102973465
Connect-Info = "At.1.8.1.1.1964 148864000 148864000"
Timestamp = 1070446539
Request-Authenticator = Verified

This logfile, like all other logfiles, grows - as soon as a user connects/disconnects to the server, another entry is appended (an entry is made up of a chunk of paragraph) . My aim is that, in as a realtime scenario as possible, as soon as another entry is appended, i will know his username, his ip-address and if he connected (denoted by Start) or disconnected (denoted by Stop). I will then produce the message on the screen or some file - "Username xyz with ip 1.2.3.4 has connected/disconnected".

My first attempt was to parse the so-mentioned logfile in such a way that each paragraph is converted into a one-line string (refer to the thread http://www.linuxquestions.org/questi...hreadid=122211). From there I tried to obtain my needed parameters.

The bad news is that i cannot obtain the parameters needed !!

am i taking the wrong approach of solving this problem? At a first glance i thought this was going to be a very easy thing to do!

ganninu · 12-04-2003, 06:17 AM

to continue with the previous post, i only managed to get the parameters for the last entry.... i used a flow like this:

./parse.bash | tail -n 1 | ./process.bash

where parse.bash will convert the paragraphs into one-line strings,
process.bash will extract the necessary parameters...

But this is not really what i want since the program quits immediately as i get the parameters of the last start/stop record - I want something like "tail -f" which waits until a new entry is available!! But using "tail -f" instead of "tail -n 1" didn't work, giving me no output !

But i guess that the problem can be still be solved. I'm just not thinking in the right way, but the solution exists!!!

jkobrien · 12-04-2003, 06:49 AM

Hi ganninu,

If I remember from your previous thread, you want to control the size of the log file.

There's an odd little command called "tac" which is "cat" backwards and does what you'd probably expect it to - catenates files in reverse. You could "tac" your logfile and count back a certain number of paragraphs, save the output to a temporary file and "tac" that again to overwrite your logfile. Though I guess you might be worried about a new entry being written in the time it took to do that. Do you administer this RADIUS server? There's bound to be something in the documentation about limiting the size of the logs.

I've no experience in that kind of thing, so I'm just throwing out suggestions.

Did you try my suggestion about the awk command? You can also use multiple field delimiters. For your specific case it would be something like this (but check for syntax!)

awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }'

John

ganninu · 12-04-2003, 07:00 AM

Hi jkobrien, thanks for your help I have managed to go that far to read the last line in the log file, but when i try to replace tail -1 by tail -f, it outputs nothing... that was my real concern. As regards to the "tac" i'll try it out as well...

jkobrien · 12-04-2003, 07:16 AM

ok, so the problem is actually how to keep an eye on the logfile and as soon as a new entry appears spit out some data.

I think we've sorted out the question of returning the values so it's probably worth your while posting a new thread on how to monitor logfiles. That should attract more knowledgeable replies

John

fsbooks · 12-04-2003, 07:22 AM

It seems as if your problem is not now with the extraction, but rather with interacting with a growing logfile. I can see several strategies.

1) If output is through the standard syslogd, there is an option to send output to a named piped (from man syslogd):

OUTPUT TO NAMED PIPES (FIFOs)
This version of syslogd has support for logging output to named pipes
(fifos). A fifo or named pipe can be used as a destination for log
messages by prepending a pipy symbol (``|'') to the name of the file.
This is handy for debugging. Note that the fifo must be created with
the mkfifo command before syslogd is started.

The following configuration file routes debug messages from the
kernel to a fifo:

# Sample configuration to route kernel debugging
# messages ONLY to /usr/adm/debug which is a
# named pipe.
kern.=debug |/usr/adm/debug

However, from the logfile format, this does not appear to be the case.

2) I've seen some programs whose log is sent to standard output. Log files are created with a pipe. In otherwords: "someprog" sends output to terminal, "someprog > logfile" sends output to logfile, "someprog > /dev/null" suppreses log. Therefore, you could you something like "someprog |tee logfile|parse.bash" to pipe the output both to the logfile and to your parsing program.

3) A daemon. Simply run your script in the background, have it check periodically (every second, every 5 minutes, whatever suits your fancy) for a change in the log file, if changed, begin processing on saved offset.

Hope these ideas help you.

ganninu · 12-04-2003, 07:41 AM

seems that the only option i have is the third one :/

ganninu · 12-04-2003, 08:38 AM

the whole problem arises because in Linux, putting a command like the following:
tail -f file1.txt | ./parse.bash

will produce some output to the screen,

whereas, strangely enough, modifying the command to:
tail -f file1.txt | ./parse.bash > buffer
, will not write any data to the file "buffer"!!! this means that the flow of data stops at the script..

parse.bash contains nothing more than this command:
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}'

Can someone explain me what's going on and why is the output not piped to a file, or to anothe script???

ganninu · 12-09-2003, 04:15 AM

Hi jkobrien,

As already stated, your line:
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }' works perfectly well. the only downside is that I cannot redirect the output to a text file...

For some strange reason,
awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }' >text_file.log

,shows no output at the file!!!!!

I need to redirect the output so that I can use another script to read from this file and do the final processing... The last thing which can pop to my mind is to try to do the final processing at the same line, but I need to know a technique:

awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }' displays two values at the screen, denoted by $4 and $23. Can I store these two values at the same command line? This is what i really need because my ultimate processing must be done on these variables. As explained, I tried to redirect the output of these variables to another file so I can awk them and process them from there, but this was not possible

I didn't know that this little application can be so hard!!

jkobrien · 12-09-2003, 05:27 AM

Hi,

I'm afraid, this works perfectly for me. Using the text from one of your mails above, I get "lungaro 17.15.3.139" in the output or redirected to file. I tried it in both tcsh and bash. Do you use a different shell?

Are you trying this command in isolation on the command line or as part of a longer shell script? Could there be something further down the script that overwrites your log file?

One minor thing (unconnected to your problem), you don't need the input redirect, "<", before "detail".

John

jkobrien · 12-09-2003, 05:30 AM

p.s. I'm rusty on shell scripting, but can you use the whole line in command substitution?

e.g.

($name,$IP_addr) = `awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' detail | tail -1 | awk -F '[\",=]' '{print $4 ,$23 }'`

John

ganninu · 12-09-2003, 05:37 AM

sorry i made an error, the command should be with "tail -f" not "tail -1". In other words -

awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}' <detail | tail -f | awk -F '[\",=]' '{print $4 ,$23 }' >text_file.log

...does not work

jkobrien · 12-09-2003, 06:33 AM

Yeah, wasn't that explained in the thread on monitoring logfiles that you started?

tail -f seems to be a non-starter for what you want to do.

It seems to me that the 3rd suggestion above from fsbooks is the way to go.

1)Reformat your logfile with
($name,$IP_addr) = `awk '$0 != "" {printf "%s, ",$0} $0 == "" {printf "\n"}'

2)Save it somewhere

3)After some interval repeat 1) and 2)

4)Compare the last lines of the current logfile with the last lines of the previous logfile.

5)If they're the same do nothing, ie. return to step 3)

6)If they're different, process the new line with
awk -F '[\",=]' '{print $4 ,$23 }'

7)Overwrite the previous logfile with the current one.

8)Return to step 3).

Obviously if there were more than one new entries, your script would have to be sophisticated enough to read back far enough to get them all, but that shouldn't present major problems.

Come to think of it, you could just use
ls -l <logfile> | awk '{print $6, $7, $8}'
to get the last modification time of the logfile. If that's newer than the last time you checked you have new entries. Though you still have to check back to the last "old" entry, so maybe that approach doesn't save you much.

Or, you could forget about the above approach altogether, just read down through the file checking the first entry of each paragraph (the time of the log entry). If that's less than the time of your last check, continue down towards the end. If you hit an entry that's more recent than your previous check, stop and start looking for the user name (is it always the second entry? Can you rely on the string "User-name" being in it?). When you've found that, start looking for the IP address (again, is that always the 6th entry? Can you rely on the text "NAS-IP-Address"? Is it always after the user-name entry?). Once you've found that start looking for new time entries, or the end of the file, again.

Your main issue with this will be how quick is it? To get quasi-realtime response, you need to have this script running more often than you're likely to get signals. Once you got the basic script working you could possibly speed it up by just looking that the end of the logfile with tac or tail.

Sorry, I've gone on for longer than I intended. I hope the above isn't too garbled and confusing.

John