LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-28-2019, 10:25 AM   #1
motherboard
LQ Newbie
 
Registered: Jun 2018
Posts: 9

Rep: Reputation: Disabled
grep help


blah blah blah www.website1.com blah blah blah
asdf www.website2.com asdf

How do I get grep to print only the website name and ignore everything before www. and everything after .com?
 
Old 08-28-2019, 10:33 AM   #2
berndbausch
Senior Member
 
Registered: Nov 2013
Location: Tokyo
Distribution: A few
Posts: 3,958

Rep: Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109
grep is not the tool for cutting lines into smaller pieces. It's the tool for filtering lines.

You want sed or awk. Something like that (not sure if it works):
Code:
sed 's/.*\(www.*com\).*/\1/'
 
Old 08-28-2019, 10:38 AM   #3
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,373
Blog Entries: 3

Rep: Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184
Quote:
Originally Posted by motherboard View Post
How do I get grep to print only the website name and ignore everything before www. and everything after .com?
Take a look at the -w and -o options together.

If your patterns get more complex take a look at -E also. Then, if you have to, escalate to using -P for PCRE.
 
Old 08-28-2019, 10:43 AM   #4
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782
not true berndbausch

Code:
grep -o "www.*com"

you can also use more complex matching


from the grep manpage
Quote:
Matcher Selection
-E, --extended-regexp
Interpret PATTERNS as extended regular expressions (EREs, see below).

-F, --fixed-strings
Interpret PATTERNS as fixed strings, not regular expressions.

-G, --basic-regexp
Interpret PATTERNS as basic regular expressions (BREs, see below). This is the default.

-P, --perl-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs). This option is experimental when combined with the -z
(--null-data) option, and grep -P may warn of unimplemented features.


however, another tool may be more suited to the task.. it really depends on what else you want to do
 
1 members found this post helpful.
Old 08-28-2019, 11:43 AM   #5
berndbausch
Senior Member
 
Registered: Nov 2013
Location: Tokyo
Distribution: A few
Posts: 3,958

Rep: Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109Reputation: 1109
Quote:
Originally Posted by Firerat View Post
not true berndbausch

Code:
grep -o "www.*com"
True, this works wonderfully.
 
Old 08-28-2019, 11:57 AM   #6
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782Reputation: 782
yeah, it can get messy the .* is greedy so if you happen to have two web addresses on a single line you end up with both and the junk inbetween.

but the same is true with the sed

awk would be better since you could loop through each field

perl is probably the natual tool for the job
but I don't know perl
 
Old 08-28-2019, 12:06 PM   #7
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,373
Blog Entries: 3

Rep: Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184Reputation: 2184
Quote:
Originally Posted by Firerat View Post
yeah, it can get messy the .* is greedy so if you happen to have two web addresses on a single line you end up with both and the junk inbetween.
grep has substantial, but not complete, support for PCRE. So you could try it like this:

Code:
grep -w -P -o 'www\..*?\.com'
With the examples given, -w is not strictly necessary but I figure it will come in handy just in case.
 
Old 08-28-2019, 12:12 PM   #8
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 607

Rep: Reputation: 301Reputation: 301Reputation: 301Reputation: 301
It can be pretty hard to match domains

Code:
./FILE
blah blah blah website1.com blah blah blah website2.com blah blah               
asdf www.website3.org asdf                                                      
                                                                                
blah blah blah www.website4.com blah blah blah                                  
asdf www.website5.com asdf
Code:
grep -woE '(?:www\.)?\w+\.[a-z]{3,4}' ./FILE
website1.com
website2.com
website3.org
website4.com
website5.com
This explains what each part does
regexr.com/4k0pk
 
Old 08-28-2019, 04:42 PM   #9
boughtonp
Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 126

Rep: Reputation: 62
Not all domain extensions are matched by \.[a-z]{3,4} - most notably any country-specific ones.

Also \w includes underscore (not valid in domains) but not hyphens (which are), so I'd probably go with:

Code:
grep -owEi '[a-z0-9-]+(\.[a-z0-9-]+)+' ./FILE
With the i for case-insensitivity.

And, if the use-case calls for it, filter the output through something that does a DNS lookup to confirm actual domains.

Last edited by boughtonp; 08-28-2019 at 05:06 PM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating an alias in ksh that uses grep and includes 'grep -v grep' doug248 Linux - Newbie 2 08-05-2012 03:07 PM
grep ?? can grep us variables? DaFrEQ Linux - Software 4 09-14-2005 01:22 PM
What does rpm -qa |grep th* (as compared to rpm -qa |grep th) display? davidas Linux - Newbie 2 03-18-2004 02:35 AM
"Undeleting" data using grep, but get "grep: memory exhausted" error SammyK Linux - Software 2 03-13-2004 04:11 PM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 12:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration