LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-20-2019, 08:14 AM   #31
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 12,935

Rep: Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078

\w and [:alnum:] are almost the same, just different syntax.
^ will have an effect on everything inside [ and ] (means exclusion instead of inclusion).
Also would be nice to check www.regex101.com because you can construct and check any regexp yourself
 
Old 07-20-2019, 09:26 AM   #32
ehartman
Member
 
Registered: Jul 2007
Location: Delft, The Netherlands
Distribution: Slackware
Posts: 801

Rep: Reputation: 400Reputation: 400Reputation: 400Reputation: 400Reputation: 400
Quote:
Originally Posted by pan64 View Post
\w and [:alnum:] are almost the same, just different syntax.
From the man page: The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]], so they extend the class [:alnum:] with the character _ (in any place for \w) or with _ as FIRST character (for \W).
As _ is not a alphanumeric character, both thus are an extension on [:alnum:]

Last edited by ehartman; 07-21-2019 at 10:46 PM.
 
Old 07-20-2019, 09:32 AM   #33
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 21,669

Rep: Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734
Quote:
Originally Posted by crts View Post
The most important thing you are missing is to provide a representative sample file and the output you expect.
You have been presented with a solution that works for the sample file you provided. Now you are telling us that the sample file is not representing the actual input data, thus the solution is inappropriate. It is pointless to provide you with a solution if you keep changing the requirement.
The OP asked for 'hints' (post #4). They've been given a LOT of hints, but don't seem to have actually worked on/thought about them.

OP, you have been given a LOT of advice that you could act on and research, to solve your own problem.
 
Old 07-21-2019, 09:09 PM   #34
blason
Member
 
Registered: Feb 2016
Posts: 90

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by crts View Post
The most important thing you are missing is to provide a representative sample file and the output you expect.
You have been presented with a solution that works for the sample file you provided. Now you are telling us that the sample file is not representing the actual input data, thus the solution is inappropriate. It is pointless to provide you with a solution if you keep changing the requirement.
Here is my Input file
Code:
example.com
test.com
test123.com
123test.ocm
calid-domain.com
test-test.net
!def
@fsf
dafsrf#
fffgg$.net
%rrt.com
^testcom
asddf&.net
as*
(
)
_
+
=
\
;
:
'
"
<
,
>
?
/
[test.net
]gsef.ex
`ftrfgr
!
@
#
$
%
^
&
*
(
)
_
=
+
{
}
[
]
;
:
"
'
<
,
.
>
/
?
-
And here is the answer I am getting. Now if you see this is matching a single dot and hyphen as well. I am working include only if dot/hyphen has surrounded by [:alnum:]
Code:
example.com
test.com
test123.com
123test.ocm
calid-domain.com
test-test.net
.
-
 
Old 07-22-2019, 02:15 AM   #35
ehartman
Member
 
Registered: Jul 2007
Location: Delft, The Netherlands
Distribution: Slackware
Posts: 801

Rep: Reputation: 400Reputation: 400Reputation: 400Reputation: 400Reputation: 400
Quote:
Originally Posted by blason View Post
Now if you see this is matching a single dot and hyphen as well. I am working include only if dot/hyphen has surrounded by [:alnum:]
Note: if this is about "filename like" strings:
1) Quite often filenames start with a dot (so-called hidden files).
Use "la -A" in your home dir to see a lot of them.
2) Filenames with multiple dots and/or dashes are common too, i.e. php-5.6.40-x86_64-1.txz
3) The _ is often used to substitute spaces, like George_Harrison-What_Is_Life.mp3
 
Old 07-22-2019, 08:04 AM   #36
blason
Member
 
Registered: Feb 2016
Posts: 90

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ehartman View Post
Note: if this is about "filename like" strings:
1) Quite often filenames start with a dot (so-called hidden files).
Use "la -A" in your home dir to see a lot of them.
2) Filenames with multiple dots and/or dashes are common too, i.e. php-5.6.40-x86_64-1.txz
3) The _ is often used to substitute spaces, like George_Harrison-What_Is_Life.mp3
Nah I am not trying to match files but trying to include the data in that file and those are domain names. Only valid characters in domains names are . and - but should have alphanumeric around it.
 
Old 07-22-2019, 08:17 AM   #37
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 12,935

Rep: Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078
so I would create something like this:
1. one alphanumeric
2. any alnum and -
3. one alnum
4. dot
5. any alnum (but at least one).

You can construct a regexp for this (or anything similar)
 
Old 07-22-2019, 12:59 PM   #38
ehartman
Member
 
Registered: Jul 2007
Location: Delft, The Netherlands
Distribution: Slackware
Posts: 801

Rep: Reputation: 400Reputation: 400Reputation: 400Reputation: 400Reputation: 400
Quote:
Originally Posted by pan64 View Post
4. dot
5. any alnum (but at least one).

You can construct a regexp for this (or anything similar)
And how do you handle then hostnames like www.debian.org or en.wikipedia.org (that is, with multiple dots)?
Those steps 4 and 5 have to be repeated until the end of the string is reached.
 
Old 07-22-2019, 01:11 PM   #39
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 21,669

Rep: Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734Reputation: 5734
Quote:
Originally Posted by blason View Post
Nah I am not trying to match files but trying to include the data in that file and those are domain names. Only valid characters in domains names are . and - but should have alphanumeric around it.
So what have you done with any/all of the 'hints' you asked for thus far, to try to accomplish this?? Haven't seen your efforts thus far.

And if all you're looking to do (since the goal has apparently changed again), is to get domain names, why can't you just grep for things like .net,.com,.edu, etc. into another file??
 
Old 07-22-2019, 10:43 PM   #40
blason
Member
 
Registered: Feb 2016
Posts: 90

Original Poster
Rep: Reputation: Disabled
Quote:
cat test | grep -v '[^[:alnum:].-]' | grep '\w'
This eventually should suffice my need? Please correct me if I am wrong.

Quote:
cat test | grep -v '[^[:alnum:].-]' | grep '\w'
example.com
test.com
test123.com
123test.ocm
calid-domain.com
test-test.net
 
Old 07-23-2019, 02:09 AM   #41
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 12,935

Rep: Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078
Quote:
Originally Posted by ehartman View Post
And how do you handle then hostnames like www.debian.org or en.wikipedia.org (that is, with multiple dots)?
Those steps 4 and 5 have to be repeated until the end of the string is reached.
this was only an example and anyone can improve. just a way to build a correct regexp

Quote:
Originally Posted by blason View Post
This eventually should suffice my need? Please correct me if I am wrong.
Did you check it? It looks [almost] identical/similar to:
Code:
grep '[[:alnum:].-]' test
Is this what you need?

Last edited by pan64; 07-23-2019 at 02:13 AM.
 
Old 07-23-2019, 02:26 AM   #42
blason
Member
 
Registered: Feb 2016
Posts: 90

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
this was only an example and anyone can improve. just a way to build a correct regexp


Did you check it? It looks [almost] identical/similar to:
Code:
grep '[[:alnum:].-]' test
Is this what you need?
Yes it is now showing me the desired results.
 
Old 07-23-2019, 05:21 AM   #43
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 12,935

Rep: Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078
in that case probably you can mark the thread solved
and again, you may check www.regex101.com to improve your skills and to check your regexps.
 
Old 08-15-2019, 01:46 AM   #44
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,152

Rep: Reputation: 524Reputation: 524Reputation: 524Reputation: 524Reputation: 524Reputation: 524
Quote:
Originally Posted by blason View Post
Hello,

That worked perfectly fine; however what I am trying to match here is and not sure if this can be achieved in the same line.
Since the above pattern is catching single dot as liternal and hyphen. Being a domain name those will be surrounded by alnum hence trying hard for validation to match . and - only if surrounded by \wfollowed by those two literals.

May be I am missing something?
The grep runs in a shell; many special characters have a special meaning in the shell, too.
Use quotes, so the shell does not try special substitutions!
Code:
< test grep -v '[^[:alnum:]\w.-]'
I am unsure if perl-style extensions like \w work within a character set [ ], so I would go for only Posix classes [:xxxx:]
Here it is sufficient to add the _ character to the [:alnum:] class
Code:
< test grep -v '[^[:alnum:]_.-]'

Last edited by MadeInGermany; 08-15-2019 at 02:04 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Find with an exclude/exclude file metallica1973 Linux - General 8 11-06-2011 09:39 PM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
Can we use exclude option in"rm" command to exclude some files/folders? yadav_rk727 Linux - Newbie 1 02-03-2010 10:14 AM
CVS Exclude : Exclude sub directories from check out On Linux from command line shajay12 Linux - Newbie 1 08-03-2009 12:36 AM
tar --exclude --exclude-from cefn Linux - Software 4 10-11-2005 07:31 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration