LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-18-2019, 07:44 AM   #1
blason
Member
 
Registered: Feb 2016
Posts: 122

Rep: Reputation: Disabled
How to exclude all speacial characters using regex?


Hi Folks,

I need to exclude special characters from file and only include
[a-zA-Z0-9] . -

In-fact I am just including domain names and exclude all special characters.

I am not able achieve the same.

~`!@#$%^&*()_+={}[]\|;:'"<,>/?

Can someone please help?
 
Old 07-18-2019, 07:45 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,840

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
which language is it? do you have any written code already?
 
1 members found this post helpful.
Old 07-18-2019, 07:52 AM   #3
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
a quick search
 
Old 07-18-2019, 07:57 AM   #4
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
which language is it? do you have any written code already?
Need that in bash, dang I am not regex pro but giving my best and failing
Any hint?
 
Old 07-18-2019, 07:59 AM   #5
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
My sample text would be

example.com
test.com
test123.com
123test.ocm
calid-domain.com
test-test.net
!def
@fsf
dafsrf#
fffgg$.net
%rrt.com
^testcom
asddf&.net
as*
(
)
_
+
=
\
;
:
'
"
<
,
>
?
/
 
Old 07-18-2019, 08:23 AM   #6
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by blason View Post
Need that in bash, dang I am not regex pro but giving my best and failing
Any hint?
You've been posting things like this for a good while now:
https://www.linuxquestions.org/quest...es-4175657403/
https://www.linuxquestions.org/quest...nd-4175656948/
https://www.linuxquestions.org/quest...rs-4175655180/
https://www.linuxquestions.org/quest...ng-4175648204/
https://www.linuxquestions.org/quest...es-4175641557/
https://www.linuxquestions.org/quest...pt-4175635666/
https://www.linuxquestions.org/quest...pt-4175616729/

Show your own efforts when posting, and do basic research. After three years, you should have SOME scripting/research skills.

Putting "bash regex strip out anything but letters and numbers" into Google pulls up a LOT of 'hints'. You've been told many times to post things in CODE tags, but don't seem to follow that advice either. The [:alnum:] is alpha-numeric.
 
Old 07-18-2019, 08:30 AM   #7
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by TB0ne View Post
You've been posting things like this for a good while now:
https://www.linuxquestions.org/quest...es-4175657403/
https://www.linuxquestions.org/quest...nd-4175656948/
https://www.linuxquestions.org/quest...rs-4175655180/
https://www.linuxquestions.org/quest...ng-4175648204/
https://www.linuxquestions.org/quest...es-4175641557/
https://www.linuxquestions.org/quest...pt-4175635666/
https://www.linuxquestions.org/quest...pt-4175616729/

Show your own efforts when posting, and do basic research. After three years, you should have SOME scripting/research skills.

Putting "bash regex strip out anything but letters and numbers" into Google pulls up a LOT of 'hints'. You've been told many times to post things in CODE tags, but don't seem to follow that advice either. The [:alnum:] is alpha-numeric.
I understand and I am definitely trying to get the answer and of course everyone first tries google which I also did and if that didnt resolve then come here.

Will definitely ensure to follow the code tags.
 
Old 07-18-2019, 08:46 AM   #8
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
You need to escape certain characters inside the RegEx:
Code:
while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\\] ]];then
                echo "$line"
        fi
done < "/path/to/file"
The above code takes care of the most problematic ones. Notice, that if you want to match a literal ']' inside the brackets then it must be the first character after the opening '['.
I will leave matching the remaining characters as an excercise.

PS:
You can also achieve this by using [:alnum:] by TB0ne but it has also a pitfall. I think, however, that doing it the "hard" way is more educational in the long run since you can learn how to handle certain characters in a RegEx.

Last edited by crts; 07-18-2019 at 08:54 AM. Reason: Added PS
 
1 members found this post helpful.
Old 07-18-2019, 08:53 AM   #9
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Code:
'[!@#%%$^*()_+=\;:,"<>?/]'
I guess I am not able to exclude single quote
 
Old 07-18-2019, 08:56 AM   #10
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Read post #8 again.
 
Old 07-18-2019, 09:01 AM   #11
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by blason
I understand and I am definitely trying to get the answer and of course everyone first tries google which I also did and if that didnt resolve then come here. Will definitely ensure to follow the code tags.
Sorry, just don't believe that. Putting the search term I used into Google yielded 559,000 hits....hard to believe that out of all that there wasn't one 'hint' you could have used. And you've been asked about CODE tags for a LONG time, but don't use them.
Quote:
Originally Posted by blason View Post
Code:
'[!@#%%$^*()_+=\;:,"<>?/]'
I guess I am not able to exclude single quote
And why is that, given the fact that I not only gave you a search-term that has your 'hints', but the **EXACT** thing you need to use for a regex to strip out anything but letters and numbers???

Last edited by TB0ne; 07-18-2019 at 09:03 AM.
 
Old 07-18-2019, 09:10 AM   #12
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,791

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Better name the printable characters, and use the complement of it, either with tr and -c option, or with a negating ^ in a charset in a RE:
Code:
tr -dc '.a-zA-Z0-9\n-' < samplefile
sed -n 's/[^.a-zA-Z0-9-]//gp' < samplefile

Last edited by MadeInGermany; 07-18-2019 at 11:44 AM. Reason: sed does not need \n here, and the . was missing
 
2 members found this post helpful.
Old 07-18-2019, 09:13 AM   #13
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
just a quick test of that one loop.
Code:
#!/bin/bash

while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\\] ]];then
                echo "$line"
        fi
done < $1
testfile
Code:
[][()\'\"~!\`@/?\>\<\\]

[ in here ] 
'what'
< if >
@googles

~where
!ho
Hello
results
Code:
[userx@arcomeo testdir]$ ./stripme testfile


Hello
tells a story...
 
2 members found this post helpful.
Old 07-18-2019, 09:19 AM   #14
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Quote:
Originally Posted by BW-userx View Post

tells a story...
And what story would that be?
 
Old 07-18-2019, 09:21 AM   #15
blason
Member
 
Registered: Feb 2016
Posts: 122

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
Better name the printable characters, and use the complement of it, either with tr and -c option, or with a negating ^ in a charset in a RE:
Code:
tr -dc '.a-zA-Z0-9\n-' < samplefile
sed -n 's/[^a-zA-Z0-9\n-]//gp' < samplefile
Thanks and nice option; however I am looking with Grep if possible.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Find with an exclude/exclude file metallica1973 Linux - General 8 11-06-2011 09:39 PM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
Can we use exclude option in"rm" command to exclude some files/folders? yadav_rk727 Linux - Newbie 1 02-03-2010 10:14 AM
CVS Exclude : Exclude sub directories from check out On Linux from command line shajay12 Linux - Newbie 1 08-03-2009 12:36 AM
tar --exclude --exclude-from cefn Linux - Software 4 10-11-2005 07:31 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration