LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-04-2011, 02:09 PM   #1
Stuart07
LQ Newbie
 
Registered: May 2006
Location: Manchester, NH
Distribution: CentOS5.5
Posts: 13

Rep: Reputation: 0
Post Best way to Parse a file


Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:

I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB


After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD

Both of these patterns could be any combination of letters and numbers. Here is example of the config:

Code:
subscriber name 10/ARDA/123456//1
  bridge-group BG_B500
  bridge-group BG_B500 access-group B500_acl01 in
  bridge-group BG_B500 access-group B500_acl02 out
  bridge-group BG_B500 aging-time 21600
  bridge-group BG_B500 spanning-disabled
Essentially, I want the output to look something like this:

Code:
10/ARDA/123456//1, BG_B500, ROUTER5

I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.

Any tips, trys, or directions would be greatly appreciated.
Thanks
 
Old 05-04-2011, 03:40 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 1,482

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by Stuart07 View Post
Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:

I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB


After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD

Both of these patterns could be any combination of letters and numbers. Here is example of the config:

Code:
subscriber name 10/ARDA/123456//1
  bridge-group BG_B500
  bridge-group BG_B500 access-group B500_acl01 in
  bridge-group BG_B500 access-group B500_acl02 out
  bridge-group BG_B500 aging-time 21600
  bridge-group BG_B500 spanning-disabled
Essentially, I want the output to look something like this:

Code:
10/ARDA/123456//1, BG_B500, ROUTER5

I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.

Any tips, trys, or directions would be greatly appreciated.
Thanks
Your best bet is Perl, which is designed to scan text files and has regular expressions for matching. Here's some code to get you started:

Code:
#!/usr/bin/perl      

use strict;
use warnings;

my ($first);

m'(\d\d/\w+/\d+/..)' && do { print "Matched $1"; $first = $1};
Put this in a file named mm, for example. Run with perl as:

perl -n mm <your_input

this part: m'(\d\d/\w*/\d*/..)' is the matching for your first line:
m - match
'' - quotes around reguler expression
() - indicates the part you want to save in $1
\d\d - matches two digits
/ - matches '/'
\w+ - matches one or more alphanumerics
'/' - another slash
\d+ - one or more digits
'/' - 3rd slash
.. - any two characters

Not sure from your description if this is exactly what you want, and not sure
where 'ROUTER5' comes from in your output, but this should get you started.

Note that I set the variable $first to the first match, so once its set you can
do the second match and then print both.

If this looks like what you want, you can read though a tutorial
 
Old 05-04-2011, 03:42 PM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
I guess you just want to add the string "ROUTERS" to the output? Would this do for you?

Code:
name='10/ARDA/123456//1'
sed -rn "\|$name| { n ; s|.* (.+)$|$name, \1, ROUTERS|p}" filename
Putting the search string into a shell variable first is just for convenience, of course.

This assumes that the "BG_B500" is always the last word on the line following the match. If not, you'll have to change the sed pattern to something like this:
Code:
sed -rn "\|$name| { n ; s|.* ([[:alnum:]]{2}_[[:alnum:]]{4}).*|$name, \1, ROUTERS|p}

Last edited by David the H.; 05-04-2011 at 03:45 PM. Reason: small adjustment
 
Old 05-04-2011, 07:55 PM   #4
Stuart07
LQ Newbie
 
Registered: May 2006
Location: Manchester, NH
Distribution: CentOS5.5
Posts: 13

Original Poster
Rep: Reputation: 0
Thanks for the replies. I've started reading up on PERL and seems a lot more useful for parsing.

I wanted to clarify what exactly it is im doing, and where the ROUTER5 comes from.

Basically, the example config I posted above with the subscriber and bridge group info, repeats about 3-5 thousand times, each time with a different subscriber name and bridge group (how ever many subscribers are on the system). What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)

So basically I want to be able to take this data and put it into a database format (CSV) like so :

10/ARDA/123456//1, BG_B500, ROUTER5


I understand the reg expressions needed to pull out the actual lines that I want, just need a little insight with the logic to get it to do what I want

Thanks,
 
Old 05-05-2011, 02:54 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,485

Rep: Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890
So after all that explanation, I still don't see where the string ROUTER5 came from? Is this perhaps a file name?
 
Old 05-05-2011, 07:33 AM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
The last word is the hostname, apparently:
Quote:
What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)
If you need help understanding my sed expression I'll break it down for you:
Code:
sed -rn "\|searchterm| { n ; s|(regex)|replacement \1|p }

".."		:double-quotes are needed around the expression when using
		:  shell variables.  Otherwise single quotes would work too.
-r		:enables extended regular expressions
-n		:turns off printing by default
\|searchterm|	:search for lines that contain searchterm (can be a regex).
		:  usually /../ is used for the delimiter, but we're using
		:  a different one here because the text being processed
		:  contains forward slashes.
{..}		:run this code block if the searchterm is found.
n		:quit processing this line and move on to the next one.
;		:command separator
s|x|y|		:the standard sed "substitute" function.  Again, the
		:  delimiter has been altered from the traditional s/x/y/.
(regex)		:the matching regex for the second line, including
		:  parentheses for capturing the code you want.
replacement \1	:the output string.  Includes \1 to substitute the
		:captured part of the matching regex.
p		:print the modified line (and only the modified line, since
		:  -n is being used).
sed is very convenient for relatively simple extractions and substitutions like this. But with multiple input values and files you'd have to wrap this up inside a shell script. As a complete language in itself, Perl is more flexible overall, and most certainly faster when processing thousands of lines. I'm going to have to sit down and learn it one of these days.
 
Old 05-05-2011, 07:56 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,485

Rep: Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890
Maybe something like:
Code:
awk 'x{printf ", %s, ",$NF;print | "hostname";x=0}/^subscriber/{printf $NF;x=1}' file
 
  


Reply

Tags
expect, parse, perl, python, script, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse file from remote server to caculate count of string existence in that file saurabhmehan Linux - Newbie 2 08-30-2010 12:30 AM
parse a file to find an ip address gurucg Programming 14 08-14-2007 01:30 AM
a script to parse a file SamuelHenderson Programming 5 03-15-2007 03:23 AM
perl script to parse this file ohcarol Programming 10 11-02-2006 09:50 AM
optimizing perl parse file. eastsuse Programming 1 12-22-2004 02:49 AM


All times are GMT -5. The time now is 10:44 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration