LinuxQuestions.org - [SOLVED] How to delete multiple lines in a file using perl

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - How to delete multiple lines in a file using perl (https://www.linuxquestions.org/questions/programming-9/how-to-delete-multiple-lines-in-a-file-using-perl-875238/)

How to delete multiple lines in a file using perl

I have a file looks like the following:

digraph topology
{
"192.168.3.254" -> "10.1.1.11"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.12"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.10"[label="1.000", style=solid];
"192.168.3.254" -> "10.1.1.9"[label="1.000", style=solid];
(skip some lines...)
"10.1.1.9" -> "10.1.1.10"[label="1.000"];
"10.1.1.9" -> "10.1.1.11"[label="1.024"];
"10.1.1.9" -> "10.1.1.12"[label="1.076"];
"10.1.1.9" -> "192.168.3.254"[label="1.000"];
"10.1.1.10" -> "10.1.1.9"[label="1.000"];
"10.1.1.10" -> "10.1.1.11"[label="1.020"];
"10.1.1.10" -> "10.1.1.12"[label="1.067"];
"10.1.1.10" -> "192.168.3.254"[label="1.000"];
"10.1.1.11" -> "10.1.1.9"[label="1.028"];
"10.1.1.11" -> "10.1.1.10"[label="1.028"];
"10.1.1.11" -> "10.1.1.12"[label="1.053"];
"10.1.1.11" -> "192.168.3.254"[label="1.000"];
"10.1.1.12" -> "10.1.1.9"[label="1.099"];
"10.1.1.12" -> "10.1.1.10"[label="1.085"];
"10.1.1.12" -> "10.1.1.11"[label="1.057"];
"10.1.1.12" -> "192.168.3.254"[label="1.000"];
"192.168.3.254" -> "10.1.1.9"[label="1.000"];
"192.168.3.254" -> "10.1.1.10"[label="1.000"];
"192.168.3.254" -> "10.1.1.11"[label="1.000"];
"192.168.3.254" -> "10.1.1.12"[label="1.000"];
"192.168.3.254" -> "192.168.3.0/24"[label="HNA"];
"192.168.3.0/24"[shape=diamond];
}

I need to search some particular lines and delete them. For example, I need to delete following lines:
"10.1.1.9" -> "10.1.1.11"[label="1.024"];
"10.1.1.11" -> "10.1.1.9"[label="1.028"];
"10.1.1.12" -> "10.1.1.11"[label="1.057"];
"10.1.1.11" -> "10.1.1.12"[label="1.053"];
"192.168.3.254" -> "192.168.3.0/24"[label="HNA"];
"192.168.3.0/24"[shape=diamond];

Order of these lines are random... So I cannot delete line #19, for example... And you can see that top four lines I want to delete are pairs. So there might be some clever way to detect the lines, if a line has both "1.9" and "1.11", then delete the line... I am new to perl language.

The following is the code I have now... I think I just need to write some code inside the while loop checking if I want to delete the line $dotline before I write to a NEW file.

Code:

#!/usr/bin/perl -w



$TOPPATH = "/tmp";

$NAME = "topology";

$FILENAME = "$TOPPATH/$NAME.dot";

$CONFFILENAME = "$TOPPATH/$NAME.conf";

$NEWFILENAME = "$TOPPATH/$NAME.new";

$EXT = "png";



`touch $TOPPATH/$NAME.$EXT`;



my $f;



(skip some lines...)



`touch $NEWFILENAME`;



my $newfile;

my $infile;

my $dotfile;

$newfile = $NEWFILENAME;

$infile = $CONFFILENAME;

$dotfile = $FILENAME;

open ( NEW , "> $newfile") or die "Can't open $newfile. $!";

open( IN , "< $infile") or die "Can't open $infile. $!";

open( DOT , "< $dotfile") or die "Can't open $dotfile. $!";

my $newline;

my $line;

my $dotline;



my $i = 0;

while( $dotline = <DOT> ) {

        $i++;

        # I think here should be the extra codes...

        #

        printf NEW "$dotline";

        if ($i == 3) {

                (skip some lines...)

        }

}



close(IN);

close(DOT);

close(NEW);



`cp $NEWFILENAME $FILENAME`;



`neato -Tpng -Gbgcolor=grey -Nfontsize=15 -Ncolor=black -Nfillcolor=green -Ecolor=blue -Earrowsize=2 $FILENAME -o $TOPPATH/$NAME.new`;



`mv $TOPPATH/$NAME.new $TOPPATH/$NAME.$EXT`;

`cp $TOPPATH/$NAME.dot $TOPPATH/$NAME-\$(date +'%Y-%m-%d-%H-%M-%S').dot`;

I'm not sure I understood your criteria fully ... ?!

Getting rid of (not printing) lines that have both 1.9 and 1.11 in them:

Code:

printf NEW "$dotline" if($dotline !~ /\.1\.9/ && $dotline !~ /\.1\.11/);

Untested, should work.

Cheers,
Tink

So not in your example of things to be removed, but using the logic you have explained and Tink's example, you would be removing:

Quote:

"192.168.3.254" -> "10.1.1.11"[label="1.000", style=solid];

ie. the very first entry in your example ... is this correct??

Thank you for reply, Tinkster.
But that deletes all lines with ".1.9" OR ".1.11"

grail
It did not remove the very first entry in my example. It removed any lines with ".1.9" OR ".1.11"

Code:

#!/usr/bin/perl -w



(skip some lines...)



while( $dotline = <DOT> ) {

        # I think here should be the extra codes...

        #

        printf NEW "$dotline" if($dotline !~ /\.1\.9/ && $dotline !~ /\.1\.11/);

        (skip some lines...)



}



(skip some lines...)

My bad - my predicate logic went bad once again; replace the && w/ ||.

Cheers,
Tink

Yes, it works with "||"
Is there a way to make code simpler or cleaner?
Now, I have a code looks like following:

Code:

#!/usr/bin/perl -w



(skip some lines...)



while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /\.9/ || $dotline !~ /\.10/) && 

                                  ($dotline !~ /\.9/ || $dotline !~ /\.11/) && 

                                  ($dotline !~ /\.9/ || $dotline !~ /\.12/) && 

                                  ($dotline !~ /\.10/ || $dotline !~ /\.11/) && 

                                  ($dotline !~ /\.10/ || $dotline !~ /\.12/) && 

                                  ($dotline !~ /\.11/ || $dotline !~ /\.12/) );

        (skip some lines...)



}



(skip some lines...)

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /\.9|\.1[10]/ || $dotline !~ /\.1[012]/);

Should do the same job ...
But: beware - your simplification (omitting the leading \.1) may
remove more than you expected inadvertently.

Cheers,
Tink

I know I am a relative noob here, but did I miss a split or something?
I am struggling to follow why we are testing $dotline twice?

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /\.1\.(9|1[0-2])"/);

I included the quotes (") as I am guessing we don't want to get rid of - 10.1.9.123

I'm guessing you don't use a lot of Perl? (Assuming I've got your qn correct)
The

Code:

while() {}

construct is actually reading the next rec in from the input file. It returns null if no rec found and skips to end of file processing.

Hi Chris

No I was more asking why the others seem to be testing $dotline twice, ie looking at posts #6 and #7?

But thanks, I did know that :)

cheers
grail

Quote:

Originally Posted by Tinkster (Post 4328762)

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /\.9|\.1[10]/ || $dotline !~ /\.1[012]/);

Thanks, but this code deletes all ".10" or ".11" lines. And for your suggestion, I added leading "1" without "\."

Here is my updated code which does exactly what I want it to do. I colored the codes so that it is easier to see. Number of lines is not too much. But I want to know if I could use other variables and while loops to reduce number of lines... I know that when I have a working code, then I am not supposed to edit it until it does not work. But there is possibility that these lines could get much larger.

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /1\.1/ || $dotline !~ /1\.[234]/) &&

                                  ($dotline !~ /1\.2/ || $dotline !~ /1\.[34]/) &&

                                  ($dotline !~ /1\.3/ || $dotline !~ /1\.4/) &&

                                  ($dotline !~ /1\.5/ || $dotline !~ /1\.[678]/) &&

                                  ($dotline !~ /1\.6/ || $dotline !~ /1\.[78]/) &&

                                  ($dotline !~ /1\.7/ || $dotline !~ /1\.8/) &&

                                  ($dotline !~ /1\.9/ || $dotline !~ /1\.1[012]/) &&

                                  ($dotline !~ /1\.10/ || $dotline !~ /1\.1[12]/) &&

                                  ($dotline !~ /1\.11/ || $dotline !~ /1\.12/) &&

                                  ($dotline !~ /1\.13/ || $dotline !~ /1\.1[456]/) &&

                                  ($dotline !~ /1\.14/ || $dotline !~ /1\.1[56]/) &&

                                  ($dotline !~ /1\.15/ || $dotline !~ /1\.16/) &&

                                  ($dotline !~ /0\/24/) );

}

Well I am not 100% sure I am on the right path, but how about:

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if($dotline !~ /^\s*"[^"]+\.1\.([1-9]|1[0-6])"[^"]+"[^"]+\.1\.([1-9]|1[0-6])".*/ && $dotline !~ /0\/24/);

Quote:

Originally Posted by grail (Post 4329591)

Well I am not 100% sure I am on the right path, but how about:

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if($dotline !~ /^\s*"[^"]+\.1\.([1-9]|1[0-6])"[^"]+"[^"]+\.1\.([1-9]|1[0-6])".*/ && $dotline !~ /0\/24/);

It does not work... It deletes some lines I do not want to delete... Anyway, now I have a working code, so I will not try to change it. Thank you all.

@grail ;)
Well I'm a bit confused because the OP said

Quote:

if a line has both "1.9" and "1.11",

which in perl is && not || ....

Maybe I'm misunderstanding the post#1, but it seems to me he's got a (possibly) large file of recs and wants to remove a smaller subset, contained in another file.
Assuming (as per example) that the recs in both files are exact matches (for those in both files), I'd create hash using the smaller set as hash keys, then read through the large file once and for each (large file) rec, check if it's in the hash of recs to be deleted.
If so, get next (large file) rec, else output (large file) rec to new file.
This will produce a file of recs not inc those in the to-be-deleted list.

Quote:

It deletes some lines I do not want to delete

My bad .. I did not notice that you removed numbers from either side, ie left hand side is missing 4, 8, 12.

I would mention that even with what you have you could submit only a single line for each coloured bracket you have:

Code:

while( $dotline = <DOT> ) {

        printf NEW "$dotline" if( ($dotline !~ /1\.[1-3]/ || $dotline !~ /1\.[2-4]/) &&

                                  ($dotline !~ /1\.[5-7]/ || $dotline !~ /1\.[6-8]/) &&

                                  ($dotline !~ /1\.(9|1[01])/ || $dotline !~ /1\.1[0-2]/) &&

                                  ($dotline !~ /1\.1[3-5]/ || $dotline !~ /1\.1[4-6]/) &&

                                  ($dotline !~ /0\/24/) );

}

I also agree with chris though, using || instead of && seems odd when you are using an exclusion.