LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-01-2011, 07:14 PM   #1
jv2112
Member
 
Registered: Jan 2009
Location: New England
Distribution: Arch Linux
Posts: 718

Rep: Reputation: 102Reputation: 102
Question Social Security # Search


I am looking at writing a script that will search the hard drive for matches certain sequences to identify files containing sensitive data to be scrubbed.

I started with the line below but it just runs on and on. I am not sure what I am doing wrong.

Any guidance would be appreciated.


Quote:
sudo find . -type f -exec grep '[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}' {} \;
 
Old 04-01-2011, 08:01 PM   #2
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
I don't see any need to use find here. grep -R can recurse directories.

I think the repetition operator you are trying to use doesn't work the way you think it does. My lazy fix is to invoke egrep which I believe works the way you want it to. It would be great if someone more experienced with the grep family can expand on this a bit.

You could also explicitly state your character classes the appropriate number of times and leave out the repetition operators.
 
Old 04-01-2011, 11:59 PM   #3
slimm609
Member
 
Registered: May 2007
Location: Chas, SC
Distribution: slackware, gentoo, fedora, LFS, sidewinder G2, solaris, FreeBSD, RHEL, SUSE, Backtrack
Posts: 428

Rep: Reputation: 65
grep -R '[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}' *

This could take a very long time to run depending on what the system specs are.

Last edited by slimm609; 04-02-2011 at 03:00 AM.
 
Old 04-02-2011, 05:27 AM   #4
jv2112
Member
 
Registered: Jan 2009
Location: New England
Distribution: Arch Linux
Posts: 718

Original Poster
Rep: Reputation: 102Reputation: 102
Question

Thanks for all the replies


Any suggestions on how to speed up. Once I tested this I was thinking of placing it in a script with additional common sequences (ie credit card) the schedule in cron to generate a list I should review / scrub.


Thoughts


Running on ->

Netbook Asus 100OE ( Atom processor(2 cores) 2 gig ram)

Desktop Custom build ( AMD 1090T (6 Cores) / 4 Gig RAM / Agility SSD Drive + 2 GBTS SATA drives)
 
1 members found this post helpful.
Old 04-02-2011, 02:08 PM   #5
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by jv2112 View Post
Any suggestions on how to speed up.
Yes.
  • Don't invoke any more processes than you absolutely must to get the job done. One example is eliminating the unneeded find command you were using.
  • Make full use of the capabilities of each program you invoke. grep -R consumes less resources than making a pipeline with another command. Wasteful constructs to avoid include things like cat somefile | grep something and grep something < somefile.
  • Whenever possible, use smaller/faster programs to get the job done. For example, cut can be much faster than awk; if cut will do the job then use it. Same applies to the grep, egrep, fgrep family; each is optimized to perform better under various conditions.
  • Consider making a C program instead of a Bash script. Languages which compile to native code can be many times faster than shell scripts.
  • Consider upgrading your computer hardware. More RAM, a faster processor, and a faster hard disk will improve performance system wide.

Quote:
Once I tested this I was thinking of placing it in a script with additional common sequences (ie credit card) the schedule in cron to generate a list I should review / scrub.
You can use the alternation operator | (pipe character) to separate multiple regular expressions. Keep in mind the order of precedence when mixing operators.

Quote:
Originally Posted by man grep
Precedence
Repetition takes precedence over concatenation, which in turn takes
precedence over alternation. A whole expression may be enclosed in
parentheses to override these precedence rules and form a
subexpression.
Code:
foo$ echo -e 'feel\nfoal\ntool\nteal\n' | grep 'ee\|oo'
feel
tool
foo$
HTH

Edit
My knowledge of the grep family is far from complete. It would be nice if someone with more knowledge would add more here.

Last edited by Telengard; 04-02-2011 at 02:15 PM.
 
Old 04-02-2011, 03:31 PM   #6
jefro
Guru
 
Registered: Mar 2008
Posts: 11,970

Rep: Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485
The problem is that there is a lot more personal protected data that could be on there and also ssn data that may not be in your format. Depending on the apps or file format the numbers could be almost anywhere.

I'd wipe the drive.
 
Old 04-02-2011, 04:36 PM   #7
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Quote:
Originally Posted by jefro View Post
The problem is that there is a lot more personal protected data that could be on there and also ssn data that may not be in your format. Depending on the apps or file format the numbers could be almost anywhere.

I'd wipe the drive.
I thought of those things too, but in the spirit of being helpful I decided to go along with OP's premise anyway.

On the other hand, there is always dban.
 
Old 04-02-2011, 05:02 PM   #8
jv2112
Member
 
Registered: Jan 2009
Location: New England
Distribution: Arch Linux
Posts: 718

Original Poster
Rep: Reputation: 102Reputation: 102
Thanks for all the input.
 
  


Reply

Tags
grep, regular expressions


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Social networking platform eXo Social released LXer Syndicated Linux News 0 05-17-2010 08:00 PM
LXer: Oklahoma Leaks Tens of Thousands of Social Security Numbers, Other Sensitive Da LXer Syndicated Linux News 0 07-04-2008 10:50 AM
LXer: Why Steal Social Security Numbers, When You Can Get Them For Free? LXer Syndicated Linux News 0 05-26-2006 06:03 AM
LXer: H&r Block Mailed Free Software with Recipient's Social Security ... LXer Syndicated Linux News 1 01-12-2006 03:58 PM


All times are GMT -5. The time now is 08:40 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration