How do you want to define 'similar'?
For example, this awk snippet will only consider letters case-insensitively, and ignore all other characters. For lines that match, it will only output the first one.
Code:
awk '{ t = tolower($0) ;
gsub(/[^a-z]+/, "", t) ;
if (!(t in seen)) print $0 ;
seen[t] = NR
}' input-file > output-file
For your example input, it will output
Code:
3 Doors Down When I'm Gone
4 Non Blondes What's Up
a-ha Take On Me
Aerosmith Cryin
Note how important the input order is. If you reverse-sort the input, i.e. run
Code:
sort -rbd input-file | awk '
{ t = tolower($0) ;
gsub(/[^a-z]+/, "", t) ;
if (!(t in seen)) print $0 ;
seen[t] = NR
}' | sort -bd > output-file
the output will be
Code:
3 Doors Down When I'm Gone
4 Non Blondes What's Up?
Aerosmith Cryin'
a-ha Take On Me
Finally, you can change the
gsub(/pattern/,replacement,t) to edit the comparison version of each line however you want. If you wish, you can also add
gsub(/pattern/,replacement) lines to edit the output lines.