LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-30-2012, 03:04 PM   #1
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Rep: Reputation: Disabled
Please Help with AWK code to parse XML messages


Hi Guy's

Can I please get some help with this code.

I have xml feed file which rapidly changing temporary file and I need to capture the content of this file as soon as data arrives.

Example of the data
Quote:
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="John Smith"><Age="23"><D.O.B="11-10-1988"> <Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Emy Williams"><Age="23"><D.O.B="01-05-1988"> <Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Jack Adam"><Age="66"><D.O.B="24-07-1945"> <Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Charlie Daniel"><Age="38"><D.O.B="15-08-1973"> <Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Ruby James"><Age="38"><D.O.B="11-03-1973"> <Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Sophie Thomas"><Age="20"><D.O.B="12-09-1991"><Gender="Female">"
Required data output
Quote:
8:30,Male,23,1
8:31,Female,23,1
8:32,Female,30,4
8:33,Male,50,10
Time is current time.

This is awk code that I have so far but this doesn't do what I need it to do. Can I please get help with it.

All I want the code to do is to run for 2 minutes process the counts , write it to output then do the same process again and again.

Code:
awk 'BEGIN { INTERVAL=120;    "date +%s"|getline sec;
    NEXT=sec+120;}

    {
        if(sec >= NEXT)
        {
           printf( "\nSummary\n" );
           for( x in agcount )
              printf( "%s,%d\n", x, agcount[x] ) | "sort";

           NEXT=sec+120;
        }

        gsub( ">", "" );        # strip uneeded junk and make "foo bar" easy to capture
        gsub( " ", "~" );
        gsub( "<", " " );

        for( i = 1; i <= NF; i++ )          # snarf up each name=value pair
        {
            if( split( $(i), a, "=" ) == 2 )
            {
                gsub(  "\"", "", a[2] );
                gsub(  "~", " ", a[2] );
                values[a[1]] = a[2];
            }
        }

        #gcount[values["Gender"]]++;         # collect counts
        #acount[values["Age"]]++;
        agcount[values["Gender"]","values["Age"]]++;

        printf( "%s %s %s %s\n", values["NAME"], values["Age"], values["D.O.B"], values["Gender"] );
    }' input-file
I can't use gawk or cron scheduler.

Will anyone be able to help me with this?

any help would be greatly appreciated.

James
 
Old 01-30-2012, 05:29 PM   #2
cbtshare
Member
 
Registered: Jul 2009
Posts: 610

Rep: Reputation: 42
Can it be a shell script?

Also what is the numbers at the end of the output?
 
Old 01-30-2012, 05:42 PM   #3
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes it can be shell script.

The numbers at the end are counts for the age, so if there are 2 males of age 34 then instead of writing male,34,1 twice. Its easier to have male,34,2.

Thanks
 
Old 01-30-2012, 10:12 PM   #4
cbtshare
Member
 
Registered: Jul 2009
Posts: 610

Rep: Reputation: 42
Below is a shell script you want to help you out....

data.txt has

Code:
 cat data.txt 
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="John Smith"><Age="23"><D.O.B="11-10-1988"><Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Emy Williams"><Age="23"><D.O.B="01-05-1988"><Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Jack Adam"><Age="66"><D.O.B="24-07-1945"><Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Charlie Daniel"><Age="38"><D.O.B="15-08-1973"><Gender="Male">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Ruby James"><Age="38"><D.O.B="11-03-1973"><Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Sophie Thomas"><Age="20"><D.O.B="12-09-1991"><Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Sophie Thomas"><Age="20"><D.O.B="12-09-1991"><Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Sophie Thomas"><Age="20"><D.O.B="12-09-1991"><Gender="Female">"
[date+time], message=[DATA= "<?xml version="1.0?"><data changeMsg><NAME="Sophie Thomas"><Age="20"><D.O.B="12-09-1991"><Gender="Female">"
Code:
#!/bin/bash
#Author cbtshare
#Pupose: To grab specific information from a file and format the information and output to screen.

FILELOCATION=/var/www/data.txt

> result.txt
cat $FILELOCATION | while read line;
do

DATE=$(echo $line | cut -d "," -f1)
GENDER=$(echo $line | cut -d "=" -f8 | cut -d '"' -f2)
AGE=$(echo $line | cut -d "<" -f5 | cut -d '"' -f2)

echo "$AGE,$GENDER,$DATE" >> result.txt

done

cat result.txt | sort | uniq -c
output

Quote:
1 66,Male,[date+time]
1 38,Male,[date+time]
1 38,Female,[date+time]
1 23,Male,[date+time]
1 23,Female,[date+time]
4 20,Female,[date+time]

Last edited by cbtshare; 01-31-2012 at 12:45 AM.
 
Old 01-31-2012, 12:54 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,251

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Please show an exact format for date + time?

Assuming the file is always the same format (and not currently including date + time) the following works:
Code:
awk -F"[<>]+" '{gsub(/^.*="|"$/,"",$(NF-1));gsub(/^.*="|"$/,"",$5);total[$5,$(NF-1)]++}END{for( x in total)print x,total[x]}' file
Of course we can easily tidy the output.
 
Old 01-31-2012, 04:32 PM   #6
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
@cbtshare,

This is good way for me to start but the only problem with this is that I am reading the data from rapidly-changing kshfile.

I am using a pipe to read the ksh file then what I want is to read from the pipe every 2 minutes and write to output file.

Is this something that can be done using shell script?

Also is there away to add the counts to the loop?

Thank you all again
 
Old 01-31-2012, 08:58 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
I assume you mean you are reading the output from a ksh file, not reading the ksh prog file.

Where does the 2 mins thing come from?
Does the ksh prog produce a new file every 2 mins?
Does it output for 2 mins then overwrite the same file?
In either case, synchronisation is key to avoid losing data.

In either case (or even if this is a continuous stream being out put eg like a logfile), I would highly recommend http://search.cpan.org/~mgrabnar/Fil...0.99.3/Tail.pm which is designed to handle those situations.
I've used it myself; very handy.
 
Old 01-31-2012, 10:12 PM   #8
cbtshare
Member
 
Registered: Jul 2009
Posts: 610

Rep: Reputation: 42
Quote:
Originally Posted by JamesOwen View Post
@cbtshare,

This is good way for me to start but the only problem with this is that I am reading the data from rapidly-changing kshfile.

I am using a pipe to read the ksh file then what I want is to read from the pipe every 2 minutes and write to output file.

Is this something that can be done using shell script?

Also is there away to add the counts to the loop?

Thank you all again
Yes definitely, you can use cron to let the script run at any interval you want.You can add counts as well :

count=0 and to increase the count , let "count=+1"

you can use the wait command to anywhere you want to pause the script also.
 
Old 02-02-2012, 02:56 PM   #9
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
@chrism01,

Where does the 2 mins thing come from?
Quote:
Does the ksh prog produce a new file every 2 mins?
Does it output for 2 mins then overwrite the same file?
In either case, synchronisation is key to avoid losing data.
The 2 minute thing came because i want the script to loop for 2 minutes and not until the end of the file. This will help me to log messages coming every 2 minutes.

The ksh file produces new message every couple seconds and each new message overwrites the previous message.

And yes you are right i want to avoid losing data.

Quote:
In either case (or even if this is a continuous stream being out put eg like a logfile), I would highly recommend http://search.cpan.org/~mgrabnar/Fil...0.99.3/Tail.pm which is designed to handle those situations.
This link is PERL code and i am not familiar with PERL coding. I have never used PERL and also File tail wouldn't this get the end of the file. my file as soon as new message arrives it overwrite the previous message so i am not sure if this will do what i want.


@cbtshare,

I can't use cron scheduler and this why I am not sure how i could solve this issue.

All Please help

Thank you all again

James
 
Old 02-02-2012, 07:25 PM   #10
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Quote:
The ksh file produces new message every couple seconds and each new message overwrites the previous message.
...
my file as soon as new message arrives it overwrite the previous message
Taking this to mean what it says, you are saying that the output_file (using that term loosely) only ever actually contains one msg (the latest), which is overwritten by the next/new msg approx(!) every 1 or 2 secs.
in that case, I don't get the 2 mins thing at all. You've got to grab each msg immediately or you will lose it...
So, you do need to use something like that Perl module or eg

Code:
tail -f output_file | your post-processing prog
In fact, you could use the first example on that Perl page pretty much as is.

Going back to bash soln, maybe instead of having the ksh file write to the ever-changing file, just pipe the output directly thus
Code:
ksh_prog | post-process_prog

# or output to log and pipe to prog
ksh_prog | tee ksh.log | post_process_prog
 
Old 02-05-2012, 10:55 AM   #11
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thank you all for replying, but I think I havenít explained myself.

I have .ksh file which contains XML messages.

What I need is to parse and capture this XML messages then store the output in log file.

The two minutes thing is something I came up with as the log file could be a large file if I get each message output to it. But if I collect the messages for 2 minutes then I will be able to get summary output as this example:
Quote:
1 66,Male,[date+time]
1 38,Male,[date+time]
1 38,Female,[date+time]
1 23,Male,[date+time]
1 23,Female,[date+time]
4 20,Female,[date+time]
[date+time] means when the output was logged in the log file.

Can anyone please help with these questions?

How can I parse the xml messages using this PERL code? (the link)

After this how can I save the output to a log file?

Any Advice will be appreciated.

Thank you all again,
James
 
Old 02-06-2012, 01:57 PM   #12
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Guy's,

Can someone please help with this issue?

Thank you all

Last edited by JamesOwen; 02-07-2012 at 02:36 PM.
 
Old 02-07-2012, 02:43 PM   #13
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
Guyís

I donít want to bump my thread but can I please get help with this problem.

Thanks

James
 
Old 02-08-2012, 03:11 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,251

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
I am not sure I understand your current issue? You have been presented with code to parse the xml and retrieve data. Redirecting this into a new file should be trivial.

Are you able to explain where you are now stuck?
 
Old 02-08-2012, 03:29 PM   #15
JamesOwen
LQ Newbie
 
Registered: Jan 2012
Posts: 8

Original Poster
Rep: Reputation: Disabled
@grail,

Which code are you referring to?

If you are referring to the bash code, this code does almost what my AWK code does?

I could parse and retrieve the data then redirect to output file. But the only problem with AWK is that it reads the whole file at once and what I want is to read part of the file each minute or so.

If this is not possible then parse messages then log output in file, but this should include the current time and count of the messages.

For the PERL link I am getting this error and I am not sure how this should parse XML messages.

Quote:
Can't locate File/Tail.pm in @INC (@INC contains:
Any help with this will be appreciated.

Thanks again
James
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse streaming XML with C++ grob115 Programming 4 12-17-2011 02:35 PM
Parse XML grob115 Programming 3 10-31-2011 09:57 AM
Parse XML to get IP Adresses Only Using Bash arvinarvin Programming 4 07-29-2010 02:56 AM
BASH: Parse XML worm5252 Programming 17 04-01-2010 11:11 AM
how to parse the xml string Dyuri Linux - Software 2 09-03-2008 09:49 AM


All times are GMT -5. The time now is 06:41 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration