LinuxQuestions.org - [SOLVED] Script to remove lines in a file with more than "x" instances of any character ?

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Script to remove lines in a file with more than "x" instances of any character ? (https://www.linuxquestions.org/questions/programming-9/script-to-remove-lines-in-a-file-with-more-than-x-instances-of-any-character-836230/)

pissed_budgie

10-04-2010 10:55 PM

Script to remove lines in a file with more than "x" instances of any character ?

Hi,

I'm looking for a script (bash, python, perl etc) or even a one liner (sed, awk etc) that can take a set of files and remove any line that has more than "x" instances of any character (case sensitive). I have been doing a lot of searching and can only come up with examples of how to remove blank lines, lines that start with a certain character or lines that contain a certain string.
This will be used on a system running a Kubuntu derivative.

As a very poor and basic example, I would like to take files that contain lines like:

Code:

ABC123#()

AbAcA123#

#AB32(1)C

AAABC123#

AAaBC123#

Aabcbcb##

#ab##c231

and end up with the files only containing the lines:

Code:

ABC123#()

#AB32(1)C

AAaBC123#

if I tell the script that 2 is the maximun number of times any character can appear in any line.

I hope that makes sense.

I know this must be possible, but for the life of me I cannot find even an example that will lead me in the right direction or better yet a piece of code I can use.

Thank you for taking a look at my post and I hope it's not me missing an obvious way of doing this.

grail

10-04-2010 11:22 PM

So are you telling it which characters to look for or only that it cannot contain more than 2 (for example) of any character?

pissed_budgie

10-04-2010 11:41 PM

it cannot contain more than 2 of any character.

AAab = keep (because case sensitive)
AAAb = delete (because of 3 A's)
AbCb = keep (only 2 chars the same)
#b## = delete (because of 3 #'s)

Thanks for the interest.

Kenhelm

10-05-2010 12:05 AM

Try

Code:

n=2

echo 'ABC123#()

AbAcA123#

#AB32(1)C

AAABC123#

AAaBC123#

Aabcbcb##

#ab##c231' | grep -Ev "(.)(.*\1){$n}"



ABC123#()

#AB32(1)C

AAaBC123#

ghostdog74

10-05-2010 12:10 AM

Code:

awk -vFS= '{

    for(i=1;i<=NF;i++){

      a[$i]++;

      if(a[$i]>2){ f=1; break }

    }

    delete a

    if(f){f=0;next}

}1' file

grail

10-05-2010 12:42 AM

Nice one Ken :)

pissed_budgie

10-05-2010 01:30 AM

Quote:

Originally Posted by Kenhelm (Post 4117911)

Try

Code:

n=2

echo 'ABC123#()

AbAcA123#

#AB32(1)C

AAABC123#

AAaBC123#

Aabcbcb##

#ab##c231' | grep -Ev "(.)(.*\1){$n}"



ABC123#()

#AB32(1)C

AAaBC123#

This works perfectly for the small example I gave, but the files are too large and too numerous to do by hand like this.
Would it be possible to make it so I can:

script.sh -n 2 -f *.txt

and have it process all the files -f *.txt
have the n input as the script is run as this number can change depending on the files processed
modify the existing files or create new ones with the same name and delete the old ones ?

I know I have a real cheek and am probably pushing my luck asking for that, but it is obvious that you could do this far easier than I could.

Really nice simple solution, thank you so much for what you have given me.

pissed_budgie

10-05-2010 01:33 AM

Quote:

Originally Posted by ghostdog74 (Post 4117916)

Code:

awk -vFS= '{

    for(i=1;i<=NF;i++){

      a[$i]++;

      if(a[$i]>2){ f=1; break }

    }

    delete a

    if(f){f=0;next}

}1' file

Thank you.

I tried this but although I could see it running through the file line by line, it neither changed the file nor create a new file with only the required lines.
Sorry

grail

10-05-2010 02:15 AM

So you asked for a solution and a few were provided and then when needing to have it run on a large amount of data you want someone to do the next step too?

Remember this is supposed to be a learning experience. What have you tried in the way of implementing the grep solution on multiple files?
Not that I would recommend it as it may never finish but grep itself has a -r option for recursive looking.

As for the awk:

Quote:

I could see it running through the file line by line

yes it shows the items to be kept so redirect to a new file and you will have your data.

pissed_budgie

10-05-2010 02:37 AM

Quote:

Originally Posted by grail (Post 4118025)

Sorry about that, no harm meant by it.
I see what you are saying and will work out the finer refinements myself
Thanks for the pointing out the obvious I totally missed.

And a big thank you to the people that provided me with a code snippet to build from.

grail

10-05-2010 04:09 AM

No probs ... just post when you get stuck :)

I would also suggest looking at something like:

Code:

while read -r line

do

    <your stuff here>

done< <(find <where your looking> -type f -name "what your looking for")

vinaytp

10-05-2010 10:46 AM

Hi pissed_budgie,

In perl

Code:

#!/usr/bin/perl

open(HANDLE, "$ARGV[0]");

while (<HANDLE>)

{

        chomp;

        if (!/(.)(.*\1){$ARGV[1]}/)

        {

        print "$_\n";

        }

}

close(HANDLE);

Execute test.pl by passing arguments

Code:

perl test.pl file 2

Warm Regards,

pissed_budgie

10-08-2010 08:16 PM

Thanks for all the replies and code snippets, I can't believe how simple this turned out to be.

All times are GMT -5. The time now is 11:06 PM.