Welcome to the most active Linux Forum on the web.
Go Back > Forums > Linux Forums > Linux - General
User Name
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.


  Search this Thread
Old 06-03-2008, 12:21 AM   #1
Registered: Nov 2006
Posts: 91

Rep: Reputation: 27
filtering a list of domain names for subdomains

Hi all,
I don't know of a better place to ask a question like this than here. I have a text file of domain names. What I want to do with it is weed out any subdomain duplicates. For example:

What I'd want to come out is;

I'd guess there's probably some way to do this with regular expressions and the like, but I don't know exactly how. The thing is I could easily use cut to strip down to the domain plus tld, but the thing is 1. some domains have two tokens of "tld" while others have only one ( versus .net or .com) and 2. There might be domains below the base domain plus tld that all match. I want the file as specific as possible without duplicating things unnecessarily. Is this possible?

Thanks for the help
Old 06-04-2008, 06:51 AM   #2
LQ Addict
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,844

Rep: Reputation: 342Reputation: 342Reputation: 342Reputation: 342
I could easily use cut to strip down to the domain plus tld,
then pipe the list through uniq.
Old 06-04-2008, 11:58 AM   #3
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 232Reputation: 232Reputation: 232
There is a solution, it's not simple, it does involve regexes, & uniq is not at the core of it.

This sounds like a problem I worked on about 3 yrs. ago to extract unique domain names from published hosts file (black)lists. -- I filter ads etc. for my whole LAN at a firewall using dnsmasq's config file, not a hosts file.

Unlike a hosts file, dnsmasq.conf can block entire domains w/o listing each individual host or sub-domain. This usually results in at least 95% shrinkage in the "distillation" process.

Be patient, I'll try to dig out my code & post it for you.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
command want list all in root / by filtering file and directory? hocheetiong Linux - Newbie 2 11-01-2007 03:16 AM
a complete balck list for filtering baambooli Linux - General 1 11-29-2006 08:53 AM
Subdomains and security with regards to root domain htmlcoder Linux - Security 1 03-10-2005 06:48 PM
Domain Names Timbo General 7 02-14-2003 04:10 PM
domain names cic Linux - Networking 3 06-11-2002 04:47 PM

All times are GMT -5. The time now is 09:37 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration