LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-15-2019, 04:31 PM   #1
Scrag
Member
 
Registered: Mar 2004
Location: Wisconsin
Distribution: Kali Linux
Posts: 131

Rep: Reputation: 15
Smile Simple bash script duplicate check


I have a simple one liner bash script which brute forces domain names and spits out the host and Ip. Works fine, except I need to add a check so it skips duplicates.

Code:
for name in `cat subdomains-top1mil-5000.txt`; do host $name.example.com | grep "has address"; done |cut -d" " -f4,1
Problem is I can get hundreds of duplicates. Any thoughts how I can ge this to check for duplicates and skip them? Everything I seem to think of seems very complex and I feel there is probably an easier way.

Any thoughts?

Thanks!
Scrag
 
Old 07-15-2019, 05:08 PM   #2
individual
Member
 
Registered: Jul 2018
Posts: 231

Rep: Reputation: 175Reputation: 175
Quote:
Originally Posted by Scrag View Post
I have a simple one liner bash script which brute forces domain names and spits out the host and Ip. Works fine, except I need to add a check so it skips duplicates.

Code:
for name in `cat subdomains-top1mil-5000.txt`; do host $name.example.com | grep "has address"; done |cut -d" " -f4,1
Problem is I can get hundreds of duplicates. Any thoughts how I can ge this to check for duplicates and skip them? Everything I seem to think of seems very complex and I feel there is probably an easier way.

Any thoughts?

Thanks!
Scrag
You want to avoid scanning previously scanned ip addresses and/or hostnames? Bash has associative arrays which are perfect for the job. Here's an example.
Code:
# an associative array for scanned ip addresses and/or hostnames.
declare -A scanned

if [[ ! ${scanned[$ip]} ]]; then
    # do something with ip
    scanned[$ip]=1
fi
 
1 members found this post helpful.
Old 07-15-2019, 05:43 PM   #3
Scrag
Member
 
Registered: Mar 2004
Location: Wisconsin
Distribution: Kali Linux
Posts: 131

Original Poster
Rep: Reputation: 15
Thanks for the response. I'm new to bash scripts and your post is above my head. For starters, i'm not sure what I should do to get the domain/IP into an array to be checked. After that i'm guessing the code you replied with would then check the "scanned" portion of the array against the new portion. I'm assuming thats what we are doing here?

I know its pathetic, but I attempted to write a script with what you gave me.

Code:
for name in `cat subdomains-top1mil-5000.txt`; 
	scanned = do host $name.example.com | grep "has address"; done |cut -d" " -f4,1

	if [[ ! ${scanned[$ip]} ]]; then
    		# do something with ip
    		scanned[$ip]=1
	fi
Thanks,
Scrag
 
Old 07-15-2019, 05:54 PM   #4
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,458

Rep: Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160
Quote:
Originally Posted by Scrag View Post
I have a simple one liner bash script which brute forces domain names and spits out the host and Ip. Works fine, except I need to add a check so it skips duplicates.

Code:
for name in `cat subdomains-top1mil-5000.txt`; do host $name.example.com | grep "has address"; done |cut -d" " -f4,1
Problem is I can get hundreds of duplicates. Any thoughts how I can ge this to check for duplicates and skip them? Everything I seem to think of seems very complex and I feel there is probably an easier way.

Any thoughts?

Thanks!
Scrag
Duplicates because of duplicates in your input file?
if the file is in sequence, use uniq instead of cat:
Code:
for name in `uniq subdomains-top1mil-5000.txt`; do...
else use sort -u
Code:
for name in `sort -u subdomains-top1mil-5000.txt`; do...
EDIT: When I run host subdomain.example.com I only get back one line...which makes the grep unnecessary...the cut works just fine without it.

One could also use sed instead of cut:
Code:
host $name.example.com | sed "s/has address //"

Last edited by scasey; 07-15-2019 at 06:02 PM.
 
Old 07-15-2019, 05:59 PM   #5
individual
Member
 
Registered: Jul 2018
Posts: 231

Rep: Reputation: 175Reputation: 175
Quote:
Originally Posted by Scrag View Post
Thanks for the response. I'm new to bash scripts and your post is above my head. For starters, i'm not sure what I should do to get the domain/IP into an array to be checked. After that i'm guessing the code you replied with would then check the "scanned" portion of the array against the new portion. I'm assuming thats what we are doing here?

I know its pathetic, but I attempted to write a script with what you gave me.

Code:
for name in `cat subdomains-top1mil-5000.txt`; 
	scanned = do host $name.example.com | grep "has address"; done |cut -d" " -f4,1

	if [[ ! ${scanned[$ip]} ]]; then
    		# do something with ip
    		scanned[$ip]=1
	fi
Thanks,
Scrag
I'm sorry, I should have explained a bit more. Simply, associative arrays store keys and values. In this case the key will be the ip you're about to scan, and the value is 1 (it's easiest to just use 1 instead of trying to increment the value). Instead of just providing code, I'm going to give you some hints and I want you to try and solve the problem (you might want to refer to my solution from your other thread, too).

Hints:
Scanned isn't set to anything because Bash won't interpret it as an assignment (remove the space between scanned = do host).
You'll need to loop through the output of the hosts command if you want to check all of the ip addresses.
Think about where you would need to place the if statement that checks for already seen ip addresses, if you want to skip the current ip address.
 
Old 07-15-2019, 06:00 PM   #6
individual
Member
 
Registered: Jul 2018
Posts: 231

Rep: Reputation: 175Reputation: 175
Quote:
Originally Posted by scasey View Post
Duplicates because of duplicates in your input file?
if the file is in sequence, use uniq instead of cat:
Code:
for name in `uniq subdomains-top1mil-5000.txt`; do...
else use sort -u
Code:
for name in `sort -u subdomains-top1mil-5000.txt`; do...
No, he means duplicates from the hosts command. It's related to this thread.
 
1 members found this post helpful.
Old 07-15-2019, 06:12 PM   #7
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,458

Rep: Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160Reputation: 1160
Quote:
Originally Posted by individual View Post
No, he means duplicates from the hosts command. It's related to this thread.
Ah. Then, yes, you've got him on the right track.

I'd probably just sort the file then do something like (psuedocode):
Code:
lasthostname=" "
read each row
get the hostname
if the hostname != lasthostname
   print 
fi
mv hostname lasthostname
## so if the hostname IS the same as lasthostname, it'll just loop without printing.
but I'm kinda lazy

Last edited by scasey; 07-15-2019 at 06:21 PM.
 
1 members found this post helpful.
Old 07-24-2019, 01:36 PM   #8
Scrag
Member
 
Registered: Mar 2004
Location: Wisconsin
Distribution: Kali Linux
Posts: 131

Original Poster
Rep: Reputation: 15
Thanks everybody.

Still learning. Problem is, the more I learn the more I change my mind and want to do things another way

So im actually working on another script now, but Ill probably be back for more questions.

Thanks!
Scrag
 
Old 07-25-2019, 03:56 AM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,153

Rep: Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525Reputation: 525
Code:
# Key-only (no value) use of an associative array.
declare -A seen
# the +x is the "existence" test
while read key
do
  if [ -z ${seen[$key]+x} ]; then
    echo "key $key"
    seen[$key]=
  fi
done <inputfile
cat inputfile
Code:
1
2
1
4
2
3
1
./testscript
Code:
key 1
key 2
key 4
key 3
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Simple bash script to check current ip and email new ip if changed TheEzekielProject LinuxQuestions.org Member Success Stories 0 07-02-2017 10:31 PM
[SOLVED] Bash script to check if file is present or not, check periodically every 30 mins Iyyappan Linux - Server 10 07-03-2013 05:19 AM
does tar or bzip2 squash duplicate or near-duplicate files? garydale Linux - Software 6 11-19-2009 04:43 PM
Simple Shell Script? Deleting Duplicate Files... Tag234 Linux - Newbie 6 10-10-2009 04:49 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration