LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 09-02-2010, 03:19 AM   #1
xsyntax
LQ Newbie
 
Registered: Dec 2009
Posts: 7

Rep: Reputation: 0
Smile Bash script to suppress matches in two column list


I have been using comm to compare two simple column lists, and suppress items that were contained in the second list (suppression list). This was extremely simple and basic, however now list1 has two columns, and I must compare the second column in list1 with my suppression list.

Previously my user list and suppression list looked like:
user list:
user1@domain.com
user2@domain.com

suppression list:
user2@domain.com

And the command I used in my bash script:
Code:
comm -3 user_list suppression_list
Now my user list looks like:
user1@domain.com 3bc81bc52e7f209c3455af320abeee00
user2@domain.com ed076488b22b5359d7ffb16b8e30caed

and my suppression list looks like:
ed076488b22b5359d7ffb16b8e30caed

Basically I need to compare my user list and suppression list to suppress any users that exist in the suppression list, then remove the second column (md5).

I wasn't sure the fastest way to make comparisons if there was a similar command like comm, or if I needed to create an array of users and see if any of them matched the suppression list one by one. This seemed like it would be pretty process intensive. Anyone have any less cumbersome ideas?

Thanks guys, you are always a great resource.
 
Old 09-02-2010, 05:02 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Try this: grep -vf file2 file1 | sed 's/ .*//'

Example run:
Code:
$ cat file1
user1@domain.com 3bc81bc52e7f209c3455af320abeee00
user2@domain.com ed076488b22b5359d7ffb16b8e30caed
user3@domain.com XXX76488b22b5359d7ffb16b8e30caed
user4@domain.com YYY76488b22b5359d7ffb16b8e30caed

$ cat file2
ed076488b22b5359d7ffb16b8e30caed
YYY76488b22b5359d7ffb16b8e30caed

$ grep -vf file2 file1 | sed 's/ .*//'
user1@domain.com
user3@domain.com
Hope this helps.
 
Old 09-02-2010, 11:16 AM   #3
xsyntax
LQ Newbie
 
Registered: Dec 2009
Posts: 7

Original Poster
Rep: Reputation: 0
druuna,

Thank you very much, that works perfectly. I was way over complicating things the way I was thinking. Your solution is simple and works perfectly.

Take it easy
 
Old 09-02-2010, 11:17 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
You're welcome
 
Old 09-02-2010, 05:16 PM   #5
xsyntax
LQ Newbie
 
Registered: Dec 2009
Posts: 7

Original Poster
Rep: Reputation: 0
I tested the solution above and it works fine with small files, however when I start working with large files the script bogs down to a crawl. Sometimes the suppression file can be up to 2GB.

I think the old script using comm was so much faster because comm requires the lists to be sorted, therefore it can compare the tow lists line by line much faster. I don't know for a fact, but I am assuming the grep is storing as much in memory as possible because it has to scroll through the files multiple times.

Is there anyone out there that has any ideas of ways that I may be able make this perform better? Any ideas are welcome!
 
Old 09-03-2010, 04:21 AM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Large files (several Gb's) do tend to put a strain on your computer (depending on the specs of that computer) and not all tools are adequate when it comes to large files.

Perl might be a better choice. Give the following a try:
Code:
#!/usr/bin/perl
use strict ;
use warnings ;

# -------------------------------------------------------------------------- #
# Variables
# ------------------------------------------------------------------ #
my $usermd5file = "file1" ;
my $md5suppfile = "file2" ;

my $user ;
my $md5key ;
my %keyuserpair ;

# -------------------------------------------------------------------------- #
# Subroutines
# ------------------------------------------------------------------ #
# fill hash with md5 -> user pairs
sub processUserMd5File {

   # open user/md5 file
   open( USERMD5FILE, $usermd5file )
     or die "Can't open $usermd5file : $!\n" ;

   # parse user/md5 file
   while ( <USERMD5FILE> ) {
      chomp() ;
      ( $user, $md5key ) = split ;
      $keyuserpair{$md5key} = $user ;
   }

   # close file
   close USERMD5FILE ;
}

# ------------------------------------------------------------------ #
# delete hash pair if in suppression list
sub fetchNotSuppressed {

   # open suppression file
   open( MD5SUPP, $md5suppfile )
     or die "Can't open $md5suppfile : $!\n" ;

   # parse suppression file
   while ( <MD5SUPP> ) {
      chomp() ;
      delete $keyuserpair{$_} if exists $keyuserpair{$_} ;
   }

   # close file
   close MD5SUPP ;
}

# -------------------------------------------------------------------------- #
# Main
# ------------------------------------------------------------------ #

processUserMd5File() ;

fetchNotSuppressed() ;

foreach $md5key ( keys( %keyuserpair ) ) {
   print "$keyuserpair{$md5key}\n" ;
}

exit 0 ;

# -------------------------------------------------------------------------- #
# End
You might need to change the bold parts to your set up:
- /usr/bin/perl => path to perl,
- file1 => file with user-md5 combo's,
- file2 => suppression list (md5 keys)

Hope this helps.
 
1 members found this post helpful.
Old 09-03-2010, 05:08 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk 'FNR==NR{md5[$1];next}(!($2 in md5)){print $1}' suppression userlist
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script: compare two group list Dr_Death_UAE Linux - General 7 09-03-2009 07:32 AM
Bash - suppress 'Command not found' musther Programming 7 02-08-2008 09:48 PM
Bash script: how do I select second-to-last argument in a list Robert S Linux - Software 2 11-23-2007 03:06 PM
Bash script - mailing list haze Programming 3 05-05-2005 12:03 AM
bash: routine outputting both matches and non-matches separately??? Bebo Programming 8 07-19-2004 06:52 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration