LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   How to exclude all speacial characters using regex? (https://www.linuxquestions.org/questions/programming-9/how-to-exclude-all-speacial-characters-using-regex-4175657614/)

crts 07-18-2019 09:27 AM

Quote:

Originally Posted by blason (Post 6016252)
Thanks and nice option; however I am looking with Grep if possible.

Then why did you ask for a bash solution in a previous post?
Quote:

Originally Posted by blason (Post 6016211)
Need that in bash ...


blason 07-18-2019 09:28 AM

Quote:

Originally Posted by BW-userx (Post 6016250)
just a quick test of that one loop.
Code:

#!/bin/bash

while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\\] ]];then
                echo "$line"
        fi
done < $1

testfile
Code:

[][()\'\"~!\`@/?\>\<\\]

[ in here ]
'what'
< if >
@googles

~where
!ho
Hello

results
Code:

[userx@arcomeo testdir]$ ./stripme testfile


Hello

tells a story...

Just a few modification

Code:

#!/bin/bash

while read -r line;do
        if [[ ! "$line" =~ [][()\'\"~!\`@/?\>\<\/\*\_\+\=\;\:\,\#\$\%\^\&] ]];then
                echo "$line"
        fi
done < $1


BW-userx 07-18-2019 09:28 AM

Quote:

Originally Posted by crts (Post 6016251)
And what story would that be?

looks like someone needs to read the foot notes. it works..

blason 07-18-2019 09:29 AM

Quote:

Originally Posted by crts (Post 6016254)
Then why did you ask for a bash solution in a previous post?

my bad bash as in wanted in grep :(

crts 07-18-2019 09:36 AM

Quote:

Originally Posted by blason (Post 6016258)
my bad bash as in wanted in grep :(

Is the sample file you provided at least representative?

crts 07-18-2019 09:50 AM

Quote:

Originally Posted by MadeInGermany (Post 6016246)
Better name the printable characters, and use the complement of it, either with tr and -c option, or with a negating ^ in a charset in a RE:
Code:

tr -dc '.a-zA-Z0-9\n-' < samplefile
sed -n 's/[^a-zA-Z0-9\n-]//gp' < samplefile


Those solutions will also keep invalid domain names, like '%rtt.com'. It will "transform" to 'rtt.com' but I am not sure if this is desired by OP.

@OP:
Please provide a sample output file of what you expect it to look like before we keep guessing.

MadeInGermany 07-18-2019 09:57 AM

If you want to not print lines that have a forbidden character, with grep:
Code:

grep -v '[^a-zA-Z0-9-]' testfile
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
Code:

grep -v '[^[:alnum:]-]' testfile

TB0ne 07-18-2019 10:04 AM

Quote:

Originally Posted by MadeInGermany (Post 6016277)
If you want to not print lines that have a forbidden character, with grep:
Code:

grep -v '[^a-zA-Z0-9-]' testfile
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
Code:

grep -v '[^[:alnum:]-]' testfile

Yep, the alnum was given earlier, and promptly ignored.

crts 07-18-2019 10:33 AM

Quote:

Originally Posted by MadeInGermany (Post 6016277)
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
Code:

grep -v '[^[:alnum:]-]' testfile

On a side note, let me just point out for OP that if you want to include '-' inside '[]' then it must be the last character. This is the pitfall I was refering to earlier. Otherwise it will be interpreted as a range operator. This solution will almost bring you there. You still need to figure out the last missing part.

MadeInGermany 07-18-2019 12:00 PM

Quote:

Originally Posted by crts (Post 6016295)
On a side note, let me just point out for OP that if you want to include '-' inside '[]' then it must be the last character. This is the pitfall I was refering to earlier. Otherwise it will be interpreted as a range operator. This solution will almost bring you there. You still need to figure out the last missing part.

Yes, so adding more extra characters must be like this
Code:

grep -v '[^[:alnum:].-]' testfile
or this
Code:

grep -v '[^.[:alnum:]-]' testfile
Actually it is possible to have the - character first (after the ^ of course), but there are other characters like a ] that must be first, so it is better to remember "- must be last".
Quote:

Originally Posted by TB0ne (Post 6016281)
Yep, the alnum was given earlier, and promptly ignored.

Please promptly ignore any sarcasm ;)

crts 07-18-2019 01:34 PM

Quote:

Originally Posted by MadeInGermany (Post 6016331)
Actually it is possible to have the - character first (after the ^ of course)

Did not know that it can be also first, always had it last when needed and never questioned it. Makes sense, though, since on position one (ignoring ^) it cannot be mistaken to indicate a range.

blason 07-18-2019 09:48 PM

Quote:

Originally Posted by MadeInGermany (Post 6016331)
Yes, so adding more extra characters must be like this
Code:

grep -v '[^[:alnum:].-]' testfile
or this
Code:

grep -v '[^.[:alnum:]-]' testfile
Actually it is possible to have the - character first (after the ^ of course), but there are other characters like a ] that must be first, so it is better to remember "- must be last".

Please promptly ignore any sarcasm ;)

It wasn't ignored but mistakenly it wasn't working as I was just using [[:alnum:]] hence was showing different results.

blason 07-19-2019 11:23 PM

Hello,

That worked perfectly fine; however what I am trying to match here is and not sure if this can be achieved in the same line.
Since the above pattern is catching single dot as liternal and hyphen. Being a domain name those will be surrounded by alnum hence trying hard for validation to match . and - only if surrounded by \wfollowed by those two literals.

May be I am missing something?

Quote:

cat test | grep -v [^[:alnum:]\w.-]

crts 07-20-2019 06:17 AM

Quote:

Originally Posted by blason (Post 6016863)
May be I am missing something?

The most important thing you are missing is to provide a representative sample file and the output you expect.
You have been presented with a solution that works for the sample file you provided. Now you are telling us that the sample file is not representing the actual input data, thus the solution is inappropriate. It is pointless to provide you with a solution if you keep changing the requirement.

BW-userx 07-20-2019 07:05 AM

wrong post.. oops


All times are GMT -5. The time now is 11:10 PM.