Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
05-04-2011, 02:09 PM
|
#1
|
|
LQ Newbie
Registered: May 2006
Location: Manchester, NH
Distribution: CentOS5.5
Posts: 13
Rep:
|
Best way to Parse a file
Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:
I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB
After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD
Both of these patterns could be any combination of letters and numbers. Here is example of the config:
Code:
subscriber name 10/ARDA/123456//1
bridge-group BG_B500
bridge-group BG_B500 access-group B500_acl01 in
bridge-group BG_B500 access-group B500_acl02 out
bridge-group BG_B500 aging-time 21600
bridge-group BG_B500 spanning-disabled
Essentially, I want the output to look something like this:
Code:
10/ARDA/123456//1, BG_B500, ROUTER5
I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.
Any tips, trys, or directions would be greatly appreciated.
Thanks
|
|
|
|
05-04-2011, 03:40 PM
|
#2
|
|
Member
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 699
Rep: 
|
Quote:
Originally Posted by Stuart07
Alright, I'm gunna try to lay this one out as it has been stumping me for quite some time now. I'm looking for a language to try, and possible examples. I'm comfortable with tcl/expect because thats what was needed for router interaction but there doesn't seem to be way to do it efficiently. So here it is:
I'm trying to have the script parse through a router config file and look for anything that matches this pattern: 12/ABCD/123456/AB
After it finds the said pattern, the line directly after contains another pattern I need to match on that looks like this: AB_ABCD
Both of these patterns could be any combination of letters and numbers. Here is example of the config:
Code:
subscriber name 10/ARDA/123456//1
bridge-group BG_B500
bridge-group BG_B500 access-group B500_acl01 in
bridge-group BG_B500 access-group B500_acl02 out
bridge-group BG_B500 aging-time 21600
bridge-group BG_B500 spanning-disabled
Essentially, I want the output to look something like this:
Code:
10/ARDA/123456//1, BG_B500, ROUTER5
I didn't post any of the code I've tried so far, because I really don't think it's the right way to do it.
Any tips, trys, or directions would be greatly appreciated.
Thanks
|
Your best bet is Perl, which is designed to scan text files and has regular expressions for matching. Here's some code to get you started:
Code:
#!/usr/bin/perl
use strict;
use warnings;
my ($first);
m'(\d\d/\w+/\d+/..)' && do { print "Matched $1"; $first = $1};
Put this in a file named mm, for example. Run with perl as:
perl -n mm <your_input
this part: m'(\d\d/\w*/\d*/..)' is the matching for your first line:
m - match
'' - quotes around reguler expression
() - indicates the part you want to save in $1
\d\d - matches two digits
/ - matches '/'
\w+ - matches one or more alphanumerics
'/' - another slash
\d+ - one or more digits
'/' - 3rd slash
.. - any two characters
Not sure from your description if this is exactly what you want, and not sure
where 'ROUTER5' comes from in your output, but this should get you started.
Note that I set the variable $first to the first match, so once its set you can
do the second match and then print both.
If this looks like what you want, you can read though a tutorial
|
|
|
|
05-04-2011, 03:42 PM
|
#3
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,589
|
I guess you just want to add the string "ROUTERS" to the output? Would this do for you?
Code:
name='10/ARDA/123456//1'
sed -rn "\|$name| { n ; s|.* (.+)$|$name, \1, ROUTERS|p}" filename
Putting the search string into a shell variable first is just for convenience, of course.
This assumes that the "BG_B500" is always the last word on the line following the match. If not, you'll have to change the sed pattern to something like this:
Code:
sed -rn "\|$name| { n ; s|.* ([[:alnum:]]{2}_[[:alnum:]]{4}).*|$name, \1, ROUTERS|p}
Last edited by David the H.; 05-04-2011 at 03:45 PM.
Reason: small adjustment
|
|
|
|
05-04-2011, 07:55 PM
|
#4
|
|
LQ Newbie
Registered: May 2006
Location: Manchester, NH
Distribution: CentOS5.5
Posts: 13
Original Poster
Rep:
|
Thanks for the replies. I've started reading up on PERL and seems a lot more useful for parsing.
I wanted to clarify what exactly it is im doing, and where the ROUTER5 comes from.
Basically, the example config I posted above with the subscriber and bridge group info, repeats about 3-5 thousand times, each time with a different subscriber name and bridge group (how ever many subscribers are on the system). What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)
So basically I want to be able to take this data and put it into a database format (CSV) like so :
10/ARDA/123456//1, BG_B500, ROUTER5
I understand the reg expressions needed to pull out the actual lines that I want, just need a little insight with the logic to get it to do what I want
Thanks,
|
|
|
|
05-05-2011, 02:54 AM
|
#5
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,328
|
So after all that explanation, I still don't see where the string ROUTER5 came from? Is this perhaps a file name?
|
|
|
|
05-05-2011, 07:33 AM
|
#6
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,589
|
The last word is the hostname, apparently:
Quote:
|
What I've done with my first trials of the script is have expect pull the hostname from the file, with regexp's and then just print it to the end of the line after the circuit id (12/ABCD/123456) and bridge group (BG_ABCD)
|
If you need help understanding my sed expression I'll break it down for you:
Code:
sed -rn "\|searchterm| { n ; s|(regex)|replacement \1|p }
".." :double-quotes are needed around the expression when using
: shell variables. Otherwise single quotes would work too.
-r :enables extended regular expressions
-n :turns off printing by default
\|searchterm| :search for lines that contain searchterm (can be a regex).
: usually /../ is used for the delimiter, but we're using
: a different one here because the text being processed
: contains forward slashes.
{..} :run this code block if the searchterm is found.
n :quit processing this line and move on to the next one.
; :command separator
s|x|y| :the standard sed "substitute" function. Again, the
: delimiter has been altered from the traditional s/x/y/.
(regex) :the matching regex for the second line, including
: parentheses for capturing the code you want.
replacement \1 :the output string. Includes \1 to substitute the
:captured part of the matching regex.
p :print the modified line (and only the modified line, since
: -n is being used).
sed is very convenient for relatively simple extractions and substitutions like this. But with multiple input values and files you'd have to wrap this up inside a shell script. As a complete language in itself, Perl is more flexible overall, and most certainly faster when processing thousands of lines. I'm going to have to sit down and learn it one of these days. 
|
|
|
|
05-05-2011, 07:56 AM
|
#7
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,328
|
Maybe something like:
Code:
awk 'x{printf ", %s, ",$NF;print | "hostname";x=0}/^subscriber/{printf $NF;x=1}' file
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 10:52 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|