grep help
blah blah blah www.website1.com blah blah blah
asdf www.website2.com asdf How do I get grep to print only the website name and ignore everything before www. and everything after .com? |
grep is not the tool for cutting lines into smaller pieces. It's the tool for filtering lines.
You want sed or awk. Something like that (not sure if it works): Code:
sed 's/.*\(www.*com\).*/\1/' |
Quote:
If your patterns get more complex take a look at -E also. Then, if you have to, escalate to using -P for PCRE. |
not true berndbausch
Code:
grep -o "www.*com" you can also use more complex matching from the grep manpage Quote:
however, another tool may be more suited to the task.. it really depends on what else you want to do |
Quote:
|
yeah, it can get messy the .* is greedy so if you happen to have two web addresses on a single line you end up with both and the junk inbetween.
but the same is true with the sed awk would be better since you could loop through each field perl is probably the natual tool for the job but I don't know perl |
Quote:
Code:
grep -w -P -o 'www\..*?\.com' |
It can be pretty hard to match domains
Code:
./FILE Code:
grep -woE '(?:www\.)?\w+\.[a-z]{3,4}' ./FILE regexr.com/4k0pk |
Not all domain extensions are matched by \.[a-z]{3,4} - most notably any country-specific ones.
Also \w includes underscore (not valid in domains) but not hyphens (which are), so I'd probably go with: Code:
grep -owEi '[a-z0-9-]+(\.[a-z0-9-]+)+' ./FILE And, if the use-case calls for it, filter the output through something that does a DNS lookup to confirm actual domains. |
All times are GMT -5. The time now is 04:08 PM. |