perl script to parse following format
I am a newbie to Linux and perl.I want to parse a file in following format
***A*** a# b# c# a# c# ***B*** a# b# a# ***C*** c# b# c# a# I want to know how to grep lines in section ***A***, and start with a#.I am not quite sure whether we can first extract paragraphs start with "***A ***" and end with "***B***". And then extract a# by using grep command. Could anyone please help me with this? Thanks. |
Could anyone help me with this? Thanks.
|
If you promise this is not homework you were suppose to do ;).
The man pages for grep have the answer you're looking for: Code:
man grep |
If you want a Perl soln, first show what you've tried so far..
|
Sure. It's not a hw assignment, but someone shared his file, and I want to extract some statistics infomation from the file and place cross-correlation with my data. While I am not quite familiar with Linux, but I do try grep with -A -B -C. The problem with this option is that the target line may not be always the 2nd,3rd or even 100th line before or after the section ***A***, so I don't think I can use grep. Maybe I am wrong. While I also look into perl which I have no idea of,
open(FH, "filename"); $flag = 0; while(<FH>) { if( /^\***A\***/ ) { $flag = ("***A***"); } elsif( $flag && /^a#/) { print $_; } } close FH; Poor programming and it doesn't work..... |
You have to escape all the *s I think
or use something like: /^\*+A\*+/ |
Ok good start. If the file is laid out as you say and you only want the matches between ***A*** and ***B***, I'd use a string comparison for that bit.
Always use the warnings and strict options as exemplified by the first 2 lines here. They'll save you a lot of stress. Code:
#!/usr/bin/perl -w http://perldoc.perl.org/ http://www.perlmonks.org/?node=Tutorials |
Quote:
Code:
for lines in open("file"): |
Another handy tool for cases like this is awk, which was certainly one of the inspirations for Perl.
awk "programs" are very simple: a regular expression (or: "string pattern"), followed by a block of statements that are to be executed when that particular pattern is matched. There's also a special-pattern that is "matched" at the start of the file, and another that is "matched" at the end of it, and a pattern that will "match" when nothing else does. "And that's it." :) So how can we use "it?" Well, there are basically three types of lines in your file:
:scratch: ... A "funny chicken-scratch" like /^\*{3}([A-Z])*{3}$/ is actually an incredibly-powerful thing, because it can not only match a particular string, but it can also extract information out of it, which you can then use in your awk-program. Let me break-down this regular-expression...
The expression /^([a-z]+)\#$/ uses "+" which means one-or-more but-at-least-one occurrence of... It uses parentheses to capture the text that precedes that '#' character. awk provides a simple but very-serviceable programming language that you can use within the various blocks that are executed when the various patterns match. You can define variables to hold, for example, the "captured" parts of one string so that you can include them inside another. You can use "if"-statements to "do things only when it makes sense to do so." |
Nice exposition, sundialsvcs.
Quote:
Since you mention grep, here are some grep-sed ideas. (Note, F is the name of your file in all of the following): Code:
cat $F |\ The problem is that the asterisk is a special character in both shell globbing (wildcards) & regex's (regular expressions). I see 2 ways to clarify the code: dump the asterisks w/ tr, or put them in short variables: Code:
## Using tr: Code:
## Using variables: Code:
for LL in {A..C} # LL==LabelLetter |
The Perl Haters Community grows beyond the imagination limits :)
I mean each time there is an OP who wants to parse a file with perl, there is always someone who suggests to use another tool like awk or python, that must stop it now! Sorry, just kidding :p |
Given that the file is divided into sections by clearly distinguishable delimiters, it seems prudent to use these as the default record separators, and let Perl do the work of separating records:
Code:
$\ = ""; Code:
while(<>){ Code:
#! /usr/bin/perl -w --- rod. |
All times are GMT -5. The time now is 07:25 AM. |