LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-04-2010, 11:55 PM   #1
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Rep: Reputation: 0
Script to remove lines in a file with more than "x" instances of any character ?


Hi,

I'm looking for a script (bash, python, perl etc) or even a one liner (sed, awk etc) that can take a set of files and remove any line that has more than "x" instances of any character (case sensitive). I have been doing a lot of searching and can only come up with examples of how to remove blank lines, lines that start with a certain character or lines that contain a certain string.
This will be used on a system running a Kubuntu derivative.

As a very poor and basic example, I would like to take files that contain lines like:

Code:
ABC123#()
AbAcA123#
#AB32(1)C
AAABC123#
AAaBC123#
Aabcbcb##
#ab##c231
and end up with the files only containing the lines:

Code:
ABC123#()
#AB32(1)C
AAaBC123#
if I tell the script that 2 is the maximun number of times any character can appear in any line.

I hope that makes sense.

I know this must be possible, but for the life of me I cannot find even an example that will lead me in the right direction or better yet a piece of code I can use.

Thank you for taking a look at my post and I hope it's not me missing an obvious way of doing this.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 10-05-2010, 12:22 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
So are you telling it which characters to look for or only that it cannot contain more than 2 (for example) of any character?
 
Old 10-05-2010, 12:41 AM   #3
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
it cannot contain more than 2 of any character.

AAab = keep (because case sensitive)
AAAb = delete (because of 3 A's)
AbCb = keep (only 2 chars the same)
#b## = delete (because of 3 #'s)

Thanks for the interest.
 
Old 10-05-2010, 01:05 AM   #4
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
Try
Code:
n=2
echo 'ABC123#()
AbAcA123#
#AB32(1)C
AAABC123#
AAaBC123#
Aabcbcb##
#ab##c231' | grep -Ev "(.)(.*\1){$n}"

ABC123#()
#AB32(1)C
AAaBC123#
 
2 members found this post helpful.
Old 10-05-2010, 01:10 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk -vFS= '{
    for(i=1;i<=NF;i++){
       a[$i]++;
       if(a[$i]>2){ f=1; break }
    }
    delete a
    if(f){f=0;next}
}1' file
 
1 members found this post helpful.
Old 10-05-2010, 01:42 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Nice one Ken
 
Old 10-05-2010, 02:30 AM   #7
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Kenhelm View Post
Try
Code:
n=2
echo 'ABC123#()
AbAcA123#
#AB32(1)C
AAABC123#
AAaBC123#
Aabcbcb##
#ab##c231' | grep -Ev "(.)(.*\1){$n}"

ABC123#()
#AB32(1)C
AAaBC123#
This works perfectly for the small example I gave, but the files are too large and too numerous to do by hand like this.
Would it be possible to make it so I can:

script.sh -n 2 -f *.txt

and have it process all the files -f *.txt
have the n input as the script is run as this number can change depending on the files processed
modify the existing files or create new ones with the same name and delete the old ones ?

I know I have a real cheek and am probably pushing my luck asking for that, but it is obvious that you could do this far easier than I could.

Really nice simple solution, thank you so much for what you have given me.
 
Old 10-05-2010, 02:33 AM   #8
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ghostdog74 View Post
Code:
awk -vFS= '{
    for(i=1;i<=NF;i++){
       a[$i]++;
       if(a[$i]>2){ f=1; break }
    }
    delete a
    if(f){f=0;next}
}1' file
Thank you.

I tried this but although I could see it running through the file line by line, it neither changed the file nor create a new file with only the required lines.
Sorry
 
Old 10-05-2010, 03:15 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
So you asked for a solution and a few were provided and then when needing to have it run on a large amount of data you want someone to do the next step too?

Remember this is supposed to be a learning experience. What have you tried in the way of implementing the grep solution on multiple files?
Not that I would recommend it as it may never finish but grep itself has a -r option for recursive looking.

As for the awk:
Quote:
I could see it running through the file line by line
yes it shows the items to be kept so redirect to a new file and you will have your data.
 
Old 10-05-2010, 03:37 AM   #10
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
So you asked for a solution and a few were provided and then when needing to have it run on a large amount of data you want someone to do the next step too?

Remember this is supposed to be a learning experience. What have you tried in the way of implementing the grep solution on multiple files?
Not that I would recommend it as it may never finish but grep itself has a -r option for recursive looking.

As for the awk:

yes it shows the items to be kept so redirect to a new file and you will have your data.
Sorry about that, no harm meant by it.
I see what you are saying and will work out the finer refinements myself
Thanks for the pointing out the obvious I totally missed.

And a big thank you to the people that provided me with a code snippet to build from.

Last edited by pissed_budgie; 10-05-2010 at 04:42 PM.
 
Old 10-05-2010, 05:09 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,689

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
No probs ... just post when you get stuck

I would also suggest looking at something like:
Code:
while read -r line
do
    <your stuff here>
done< <(find <where your looking> -type f -name "what your looking for")
 
Old 10-05-2010, 11:46 AM   #12
vinaytp
Member
 
Registered: Apr 2009
Location: Bengaluru, India
Distribution: RHEL 5.4, 6.0, Ubuntu 10.04
Posts: 704

Rep: Reputation: 55
Hi pissed_budgie,

In perl

Code:
#!/usr/bin/perl
open(HANDLE, "$ARGV[0]");
while (<HANDLE>)
{
        chomp;
        if (!/(.)(.*\1){$ARGV[1]}/)
        {
        print "$_\n";
        }
}
close(HANDLE);
Execute test.pl by passing arguments
Code:
perl test.pl file 2
Warm Regards,

Last edited by vinaytp; 10-05-2010 at 11:49 AM.
 
Old 10-08-2010, 09:16 PM   #13
pissed_budgie
LQ Newbie
 
Registered: Oct 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks for all the replies and code snippets, I can't believe how simple this turned out to be.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I "cat" or "grep" a file to ignore lines starting with "#" ??? callagga Linux - Newbie 7 08-16-2013 07:58 AM
"Remove file if older than ..." Help with a script rbalaa Linux - General 8 09-13-2010 10:40 PM
bash - how to remove lines from "FILE_A" which presents in "FILE_B" ? Vilmerok Programming 4 03-13-2009 05:27 AM
How to write a bash script to replace all "KH" to "K" in file ABC??? cqmyg5 Slackware 4 07-24-2007 10:00 AM
remove folder that stats with a "special" character mago Linux - General 3 06-27-2006 05:40 PM


All times are GMT -5. The time now is 04:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration