There is a solution, it's not simple, it does involve regexes, & uniq is not at the core of it.
This sounds like a problem I worked on about 3 yrs. ago to extract unique domain names from published hosts file (black)lists. -- I filter ads etc. for my whole LAN at a firewall using dnsmasq's config file, not a hosts file.
Unlike a hosts file, dnsmasq.conf can block entire domains w/o listing each individual host or sub-domain. This usually results in at least 95% shrinkage in the "distillation" process.
Be patient, I'll try to dig out my code & post it for you.