LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How to exclude all speacial characters using regex? (https://www.linuxquestions.org/questions/programming-9/how-to-exclude-all-speacial-characters-using-regex-4175657614/)

blason 07-18-2019 07:44 AM

How to exclude all speacial characters using regex?
 
Hi Folks,

I need to exclude special characters from file and only include
[a-zA-Z0-9] . -

In-fact I am just including domain names and exclude all special characters.

I am not able achieve the same.

~`!@#$%^&*()_+={}[]\|;:'"<,>/?

Can someone please help?

pan64 07-18-2019 07:45 AM

which language is it? do you have any written code already?

BW-userx 07-18-2019 07:52 AM

a quick search

blason 07-18-2019 07:57 AM

Quote:

Originally Posted by pan64 (Post 6016202)
which language is it? do you have any written code already?

Need that in bash, dang I am not regex pro but giving my best and failing :(
Any hint?

blason 07-18-2019 07:59 AM

My sample text would be

example.com
test.com
test123.com
123test.ocm
calid-domain.com
test-test.net
!def
@fsf
dafsrf#
fffgg$.net
%rrt.com
^testcom
asddf&.net
as*
(
)
_
+
=
\
;
:
'
"
<
,
>
?
/

TB0ne 07-18-2019 08:23 AM

Quote:

Originally Posted by blason (Post 6016211)
Need that in bash, dang I am not regex pro but giving my best and failing :(
Any hint?

You've been posting things like this for a good while now:
https://www.linuxquestions.org/quest...es-4175657403/
https://www.linuxquestions.org/quest...nd-4175656948/
https://www.linuxquestions.org/quest...rs-4175655180/
https://www.linuxquestions.org/quest...ng-4175648204/
https://www.linuxquestions.org/quest...es-4175641557/
https://www.linuxquestions.org/quest...pt-4175635666/
https://www.linuxquestions.org/quest...pt-4175616729/

Show your own efforts when posting, and do basic research. After three years, you should have SOME scripting/research skills.

Putting "bash regex strip out anything but letters and numbers" into Google pulls up a LOT of 'hints'. You've been told many times to post things in CODE tags, but don't seem to follow that advice either. The [:alnum:] is alpha-numeric.

blason 07-18-2019 08:30 AM

Quote:

Originally Posted by TB0ne (Post 6016222)
You've been posting things like this for a good while now:
https://www.linuxquestions.org/quest...es-4175657403/
https://www.linuxquestions.org/quest...nd-4175656948/
https://www.linuxquestions.org/quest...rs-4175655180/
https://www.linuxquestions.org/quest...ng-4175648204/
https://www.linuxquestions.org/quest...es-4175641557/
https://www.linuxquestions.org/quest...pt-4175635666/
https://www.linuxquestions.org/quest...pt-4175616729/

Show your own efforts when posting, and do basic research. After three years, you should have SOME scripting/research skills.

Putting "bash regex strip out anything but letters and numbers" into Google pulls up a LOT of 'hints'. You've been told many times to post things in CODE tags, but don't seem to follow that advice either. The [:alnum:] is alpha-numeric.

I understand and I am definitely trying to get the answer and of course everyone first tries google which I also did and if that didnt resolve then come here.

Will definitely ensure to follow the code tags.

crts 07-18-2019 08:46 AM

You need to escape certain characters inside the RegEx:
Code:

while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\\] ]];then
                echo "$line"
        fi
done < "/path/to/file"

The above code takes care of the most problematic ones. Notice, that if you want to match a literal ']' inside the brackets then it must be the first character after the opening '['.
I will leave matching the remaining characters as an excercise.

PS:
You can also achieve this by using [:alnum:] by TB0ne but it has also a pitfall. I think, however, that doing it the "hard" way is more educational in the long run since you can learn how to handle certain characters in a RegEx.

blason 07-18-2019 08:53 AM

Code:

'[!@#%%$^*()_+=\;:,"<>?/]'
I guess I am not able to exclude single quote

crts 07-18-2019 08:56 AM

Read post #8 again.

TB0ne 07-18-2019 09:01 AM

Quote:

Originally Posted by blason
I understand and I am definitely trying to get the answer and of course everyone first tries google which I also did and if that didnt resolve then come here. Will definitely ensure to follow the code tags.

Sorry, just don't believe that. Putting the search term I used into Google yielded 559,000 hits....hard to believe that out of all that there wasn't one 'hint' you could have used. And you've been asked about CODE tags for a LONG time, but don't use them.
Quote:

Originally Posted by blason (Post 6016236)
Code:

'[!@#%%$^*()_+=\;:,"<>?/]'
I guess I am not able to exclude single quote

And why is that, given the fact that I not only gave you a search-term that has your 'hints', but the **EXACT** thing you need to use for a regex to strip out anything but letters and numbers???

MadeInGermany 07-18-2019 09:10 AM

Better name the printable characters, and use the complement of it, either with tr and -c option, or with a negating ^ in a charset in a RE:
Code:

tr -dc '.a-zA-Z0-9\n-' < samplefile
sed -n 's/[^.a-zA-Z0-9-]//gp' < samplefile


BW-userx 07-18-2019 09:13 AM

just a quick test of that one loop.
Code:

#!/bin/bash

while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\\] ]];then
                echo "$line"
        fi
done < $1

testfile
Code:

[][()\'\"~!\`@/?\>\<\\]

[ in here ]
'what'
< if >
@googles

~where
!ho
Hello

results
Code:

[userx@arcomeo testdir]$ ./stripme testfile


Hello

tells a story...

crts 07-18-2019 09:19 AM

Quote:

Originally Posted by BW-userx (Post 6016250)

tells a story...

And what story would that be?

blason 07-18-2019 09:21 AM

Quote:

Originally Posted by MadeInGermany (Post 6016246)
Better name the printable characters, and use the complement of it, either with tr and -c option, or with a negating ^ in a charset in a RE:
Code:

tr -dc '.a-zA-Z0-9\n-' < samplefile
sed -n 's/[^a-zA-Z0-9\n-]//gp' < samplefile


Thanks and nice option; however I am looking with Grep if possible.


All times are GMT -5. The time now is 11:42 PM.