LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Best way to Parse a file (http://www.linuxquestions.org/questions/programming-9/best-way-to-parse-a-file-878829/)

Stuart07 05-04-2011 02:09 PM

Best way to Parse a file
 
Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:

I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB


After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD

Both of these patterns could be any combination of letters and numbers. Here is example of the config:

Code:

subscriber name 10/ARDA/123456//1
  bridge-group BG_B500
  bridge-group BG_B500 access-group B500_acl01 in
  bridge-group BG_B500 access-group B500_acl02 out
  bridge-group BG_B500 aging-time 21600
  bridge-group BG_B500 spanning-disabled

Essentially, I want the output to look something like this:

Code:

10/ARDA/123456//1, BG_B500, ROUTER5

I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.

Any tips, trys, or directions would be greatly appreciated.
Thanks

smallpond 05-04-2011 03:40 PM

Quote:

Originally Posted by Stuart07 (Post 4346549)
Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:

I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB


After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD

Both of these patterns could be any combination of letters and numbers. Here is example of the config:

Code:

subscriber name 10/ARDA/123456//1
  bridge-group BG_B500
  bridge-group BG_B500 access-group B500_acl01 in
  bridge-group BG_B500 access-group B500_acl02 out
  bridge-group BG_B500 aging-time 21600
  bridge-group BG_B500 spanning-disabled

Essentially, I want the output to look something like this:

Code:

10/ARDA/123456//1, BG_B500, ROUTER5

I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.

Any tips, trys, or directions would be greatly appreciated.
Thanks

Your best bet is Perl, which is designed to scan text files and has regular expressions for matching. Here's some code to get you started:

Code:

#!/usr/bin/perl     

use strict;
use warnings;

my ($first);

m'(\d\d/\w+/\d+/..)' && do { print "Matched $1"; $first = $1};

Put this in a file named mm, for example. Run with perl as:

perl -n mm <your_input

this part: m'(\d\d/\w*/\d*/..)' is the matching for your first line:
m - match
'' - quotes around reguler expression
() - indicates the part you want to save in $1
\d\d - matches two digits
/ - matches '/'
\w+ - matches one or more alphanumerics
'/' - another slash
\d+ - one or more digits
'/' - 3rd slash
.. - any two characters

Not sure from your description if this is exactly what you want, and not sure
where 'ROUTER5' comes from in your output, but this should get you started.

Note that I set the variable $first to the first match, so once its set you can
do the second match and then print both.

If this looks like what you want, you can read though a tutorial

David the H. 05-04-2011 03:42 PM

I guess you just want to add the string "ROUTERS" to the output? Would this do for you?

Code:

name='10/ARDA/123456//1'
sed -rn "\|$name| { n ; s|.* (.+)$|$name, \1, ROUTERS|p}" filename

Putting the search string into a shell variable first is just for convenience, of course.

This assumes that the "BG_B500" is always the last word on the line following the match. If not, you'll have to change the sed pattern to something like this:
Code:


sed -rn "\|$name| { n ; s|.* ([[:alnum:]]{2}_[[:alnum:]]{4}).*|$name, \1, ROUTERS|p}


Stuart07 05-04-2011 07:55 PM

Thanks for the replies. I've started reading up on PERL and seems a lot more useful for parsing.

I wanted to clarify what exactly it is im doing, and where the ROUTER5 comes from.

Basically, the example config I posted above with the subscriber and bridge group info, repeats about 3-5 thousand times, each time with a different subscriber name and bridge group (how ever many subscribers are on the system). What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)

So basically I want to be able to take this data and put it into a database format (CSV) like so :

10/ARDA/123456//1, BG_B500, ROUTER5


I understand the reg expressions needed to pull out the actual lines that I want, just need a little insight with the logic to get it to do what I want

Thanks,

grail 05-05-2011 02:54 AM

So after all that explanation, I still don't see where the string ROUTER5 came from? Is this perhaps a file name?

David the H. 05-05-2011 07:33 AM

The last word is the hostname, apparently:
Quote:

What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)
If you need help understanding my sed expression I'll break it down for you:
Code:


sed -rn "\|searchterm| { n ; s|(regex)|replacement \1|p }

".."                :double-quotes are needed around the expression when using
                :  shell variables.  Otherwise single quotes would work too.
-r                :enables extended regular expressions
-n                :turns off printing by default
\|searchterm|        :search for lines that contain searchterm (can be a regex).
                :  usually /../ is used for the delimiter, but we're using
                :  a different one here because the text being processed
                :  contains forward slashes.
{..}                :run this code block if the searchterm is found.
n                :quit processing this line and move on to the next one.
;                :command separator
s|x|y|                :the standard sed "substitute" function.  Again, the
                :  delimiter has been altered from the traditional s/x/y/.
(regex)                :the matching regex for the second line, including
                :  parentheses for capturing the code you want.
replacement \1        :the output string.  Includes \1 to substitute the
                :captured part of the matching regex.
p                :print the modified line (and only the modified line, since
                :  -n is being used).

sed is very convenient for relatively simple extractions and substitutions like this. But with multiple input values and files you'd have to wrap this up inside a shell script. As a complete language in itself, Perl is more flexible overall, and most certainly faster when processing thousands of lines. I'm going to have to sit down and learn it one of these days. :)

grail 05-05-2011 07:56 AM

Maybe something like:
Code:

awk 'x{printf ", %s, ",$NF;print | "hostname";x=0}/^subscriber/{printf $NF;x=1}' file


All times are GMT -5. The time now is 05:41 AM.