LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How to delete multiple lines in a file using perl (https://www.linuxquestions.org/questions/programming-9/how-to-delete-multiple-lines-in-a-file-using-perl-875238/)

yjy4321 04-15-2011 03:55 PM

How to delete multiple lines in a file using perl
 
I have a file looks like the following:

digraph topology
{
"192.168.3.254" -> "10.1.1.11"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.12"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.10"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.9"[label="1.000", style=solid];
(skip some lines...)
"10.1.1.9" -> "10.1.1.10"[label="1.000"];
"10.1.1.9" -> "10.1.1.11"[label="1.024"];
"10.1.1.9" -> "10.1.1.12"[label="1.076"];
"10.1.1.9" -> "192.168.3.254"[label="1.000"];
"10.1.1.10" -> "10.1.1.9"[label="1.000"];
"10.1.1.10" -> "10.1.1.11"[label="1.020"];
"10.1.1.10" -> "10.1.1.12"[label="1.067"];
"10.1.1.10" -> "192.168.3.254"[label="1.000"];
"10.1.1.11" -> "10.1.1.9"[label="1.028"];
"10.1.1.11" -> "10.1.1.10"[label="1.028"];
"10.1.1.11" -> "10.1.1.12"[label="1.053"];
"10.1.1.11" -> "192.168.3.254"[label="1.000"];
"10.1.1.12" -> "10.1.1.9"[label="1.099"];
"10.1.1.12" -> "10.1.1.10"[label="1.085"];
"10.1.1.12" -> "10.1.1.11"[label="1.057"];
"10.1.1.12" -> "192.168.3.254"[label="1.000"];
"192.168.3.254" -> "10.1.1.9"[label="1.000"];
"192.168.3.254" -> "10.1.1.10"[label="1.000"];
"192.168.3.254" -> "10.1.1.11"[label="1.000"];
"192.168.3.254" -> "10.1.1.12"[label="1.000"];
"192.168.3.254" -> "192.168.3.0/24"[label="HNA"];
"192.168.3.0/24"[shape=diamond];
}

I need to search some particular lines and delete them. For example, I need to delete following lines:
"10.1.1.9" -> "10.1.1.11"[label="1.024"];
"10.1.1.11" -> "10.1.1.9"[label="1.028"];
"10.1.1.12" -> "10.1.1.11"[label="1.057"];
"10.1.1.11" -> "10.1.1.12"[label="1.053"];
"192.168.3.254" -> "192.168.3.0/24"[label="HNA"];
"192.168.3.0/24"[shape=diamond];

Order of these lines are random... So I cannot delete line #19, for example... And you can see that top four lines I want to delete are pairs. So there might be some clever way to detect the lines, if a line has both "1.9" and "1.11", then delete the line... I am new to perl language.

The following is the code I have now... I think I just need to write some code inside the while loop checking if I want to delete the line $dotline before I write to a NEW file.
Code:

#!/usr/bin/perl -w

$TOPPATH = "/tmp";
$NAME = "topology";
$FILENAME = "$TOPPATH/$NAME.dot";
$CONFFILENAME = "$TOPPATH/$NAME.conf";
$NEWFILENAME = "$TOPPATH/$NAME.new";
$EXT = "png";

`touch $TOPPATH/$NAME.$EXT`;

my $f;

(skip some lines...)

`touch $NEWFILENAME`;

my $newfile;
my $infile;
my $dotfile;
$newfile = $NEWFILENAME;
$infile = $CONFFILENAME;
$dotfile = $FILENAME;
open ( NEW , "> $newfile") or die "Can't open $newfile. $!";
open( IN , "< $infile") or die "Can't open $infile. $!";
open( DOT , "< $dotfile") or die "Can't open $dotfile. $!";
my $newline;
my $line;
my $dotline;

my $i = 0;
while( $dotline = <DOT> ) {
        $i++;
        # I think here should be the extra codes...
        #

        printf NEW "$dotline";
        if ($i == 3) {
                (skip some lines...)
        }
}

close(IN);
close(DOT);
close(NEW);

`cp $NEWFILENAME $FILENAME`;

`neato -Tpng -Gbgcolor=grey -Nfontsize=15 -Ncolor=black -Nfillcolor=green -Ecolor=blue -Earrowsize=2 $FILENAME -o $TOPPATH/$NAME.new`;

`mv $TOPPATH/$NAME.new $TOPPATH/$NAME.$EXT`;
`cp $TOPPATH/$NAME.dot $TOPPATH/$NAME-\$(date +'%Y-%m-%d-%H-%M-%S').dot`;


Tinkster 04-15-2011 06:52 PM

I'm not sure I understood your criteria fully ... ?!

Getting rid of (not printing) lines that have both 1.9 and 1.11 in them:

Code:

printf NEW "$dotline" if($dotline !~ /\.1\.9/ && $dotline !~ /\.1\.11/);
Untested, should work.



Cheers,
Tink

grail 04-16-2011 02:37 AM

So not in your example of things to be removed, but using the logic you have explained and Tink's example, you would be removing:
Quote:

"192.168.3.254" -> "10.1.1.11"[label="1.000", style=solid];
ie. the very first entry in your example ... is this correct??

yjy4321 04-18-2011 04:20 PM

Thank you for reply, Tinkster.
But that deletes all lines with ".1.9" OR ".1.11"

grail
It did not remove the very first entry in my example. It removed any lines with ".1.9" OR ".1.11"

Code:

#!/usr/bin/perl -w

(skip some lines...)

while( $dotline = <DOT> ) {
        # I think here should be the extra codes...
        #

        printf NEW "$dotline" if($dotline !~ /\.1\.9/ && $dotline !~ /\.1\.11/);
        (skip some lines...)

}

(skip some lines...)


Tinkster 04-18-2011 05:24 PM

My bad - my predicate logic went bad once again; replace the && w/ ||.


Cheers,
Tink

yjy4321 04-18-2011 06:45 PM

Yes, it works with "||"
Is there a way to make code simpler or cleaner?
Now, I have a code looks like following:
Code:

#!/usr/bin/perl -w

(skip some lines...)

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /\.9/ || $dotline !~ /\.10/) &&
                                  ($dotline !~ /\.9/ || $dotline !~ /\.11/) &&
                                  ($dotline !~ /\.9/ || $dotline !~ /\.12/) &&
                                  ($dotline !~ /\.10/ || $dotline !~ /\.11/) &&
                                  ($dotline !~ /\.10/ || $dotline !~ /\.12/) &&
                                  ($dotline !~ /\.11/ || $dotline !~ /\.12/) );
        (skip some lines...)

}

(skip some lines...)


Tinkster 04-18-2011 08:49 PM

Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /\.9|\.1[10]/ || $dotline !~ /\.1[012]/);



Should do the same job ...
But: beware - your simplification (omitting the leading \.1) may
remove more than you expected inadvertently.

Cheers,
Tink

grail 04-18-2011 10:58 PM

I know I am a relative noob here, but did I miss a split or something?
I am struggling to follow why we are testing $dotline twice?
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /\.1\.(9|1[0-2])"/);

I included the quotes (") as I am guessing we don't want to get rid of - 10.1.9.123

chrism01 04-19-2011 01:23 AM

I'm guessing you don't use a lot of Perl? (Assuming I've got your qn correct)
The
Code:

while() {}
construct is actually reading the next rec in from the input file. It returns null if no rec found and skips to end of file processing.

grail 04-19-2011 01:30 AM

Hi Chris

No I was more asking why the others seem to be testing $dotline twice, ie looking at posts #6 and #7?

But thanks, I did know that :)

cheers
grail

yjy4321 04-19-2011 11:41 AM

Quote:

Originally Posted by Tinkster (Post 4328762)
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /\.9|\.1[10]/ || $dotline !~ /\.1[012]/);


Thanks, but this code deletes all ".10" or ".11" lines. And for your suggestion, I added leading "1" without "\."

Here is my updated code which does exactly what I want it to do. I colored the codes so that it is easier to see. Number of lines is not too much. But I want to know if I could use other variables and while loops to reduce number of lines... I know that when I have a working code, then I am not supposed to edit it until it does not work. But there is possibility that these lines could get much larger.
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /1\.1/ || $dotline !~ /1\.[234]/) &&
                                  ($dotline !~ /1\.2/ || $dotline !~ /1\.[34]/) &&
                                  ($dotline !~ /1\.3/ || $dotline !~ /1\.4/) &&

                                  ($dotline !~ /1\.5/ || $dotline !~ /1\.[678]/) &&
                                  ($dotline !~ /1\.6/ || $dotline !~ /1\.[78]/) &&
                                  ($dotline !~ /1\.7/ || $dotline !~ /1\.8/) &&

                                  ($dotline !~ /1\.9/ || $dotline !~ /1\.1[012]/) &&
                                  ($dotline !~ /1\.10/ || $dotline !~ /1\.1[12]/) &&
                                  ($dotline !~ /1\.11/ || $dotline !~ /1\.12/) &&

                                  ($dotline !~ /1\.13/ || $dotline !~ /1\.1[456]/) &&
                                  ($dotline !~ /1\.14/ || $dotline !~ /1\.1[56]/) &&
                                  ($dotline !~ /1\.15/ || $dotline !~ /1\.16/) &&

                                  ($dotline !~ /0\/24/) );
}


grail 04-19-2011 12:27 PM

Well I am not 100% sure I am on the right path, but how about:
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if($dotline !~ /^\s*"[^"]+\.1\.([1-9]|1[0-6])"[^"]+"[^"]+\.1\.([1-9]|1[0-6])".*/ && $dotline !~ /0\/24/);


yjy4321 04-20-2011 01:56 PM

Quote:

Originally Posted by grail (Post 4329591)
Well I am not 100% sure I am on the right path, but how about:
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if($dotline !~ /^\s*"[^"]+\.1\.([1-9]|1[0-6])"[^"]+"[^"]+\.1\.([1-9]|1[0-6])".*/ && $dotline !~ /0\/24/);


It does not work... It deletes some lines I do not want to delete... Anyway, now I have a working code, so I will not try to change it. Thank you all.

chrism01 04-21-2011 01:04 AM

@grail ;)
Well I'm a bit confused because the OP said
Quote:

if a line has both "1.9" and "1.11",
which in perl is && not || ....

Maybe I'm misunderstanding the post#1, but it seems to me he's got a (possibly) large file of recs and wants to remove a smaller subset, contained in another file.
Assuming (as per example) that the recs in both files are exact matches (for those in both files), I'd create hash using the smaller set as hash keys, then read through the large file once and for each (large file) rec, check if it's in the hash of recs to be deleted.
If so, get next (large file) rec, else output (large file) rec to new file.
This will produce a file of recs not inc those in the to-be-deleted list.

grail 04-21-2011 10:15 AM

Quote:

It deletes some lines I do not want to delete
My bad .. I did not notice that you removed numbers from either side, ie left hand side is missing 4, 8, 12.

I would mention that even with what you have you could submit only a single line for each coloured bracket you have:
Code:

while( $dotline = <DOT> ) {
        printf NEW "$dotline" if( ($dotline !~ /1\.[1-3]/ || $dotline !~ /1\.[2-4]/) &&
                                  ($dotline !~ /1\.[5-7]/ || $dotline !~ /1\.[6-8]/) &&
                                  ($dotline !~ /1\.(9|1[01])/ || $dotline !~ /1\.1[0-2]/) &&
                                  ($dotline !~ /1\.1[3-5]/ || $dotline !~ /1\.1[4-6]/) &&
                                  ($dotline !~ /0\/24/) );
}

I also agree with chris though, using || instead of && seems odd when you are using an exclusion.


All times are GMT -5. The time now is 02:18 AM.