[SOLVED] How to exclude all speacial characters using regex?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If you want to not print lines that have a forbidden character, with grep:
Code:
grep -v '[^a-zA-Z0-9-]' testfile
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
If you want to not print lines that have a forbidden character, with grep:
Code:
grep -v '[^a-zA-Z0-9-]' testfile
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
Code:
grep -v '[^[:alnum:]-]' testfile
Yep, the alnum was given earlier, and promptly ignored.
Again, it is easier to name the allowed characters.
For the [a-zA-Z0-9] set there is [[:alnum:]], can be augmented with extra characters and of course with the ^ negation:
Code:
grep -v '[^[:alnum:]-]' testfile
On a side note, let me just point out for OP that if you want to include '-' inside '[]' then it must be the last character. This is the pitfall I was refering to earlier. Otherwise it will be interpreted as a range operator. This solution will almost bring you there. You still need to figure out the last missing part.
On a side note, let me just point out for OP that if you want to include '-' inside '[]' then it must be the last character. This is the pitfall I was refering to earlier. Otherwise it will be interpreted as a range operator. This solution will almost bring you there. You still need to figure out the last missing part.
Yes, so adding more extra characters must be like this
Code:
grep -v '[^[:alnum:].-]' testfile
or this
Code:
grep -v '[^.[:alnum:]-]' testfile
Actually it is possible to have the - character first (after the ^ of course), but there are other characters like a ] that must be first, so it is better to remember "- must be last".
Quote:
Originally Posted by TB0ne
Yep, the alnum was given earlier, and promptly ignored.
Actually it is possible to have the - character first (after the ^ of course)
Did not know that it can be also first, always had it last when needed and never questioned it. Makes sense, though, since on position one (ignoring ^) it cannot be mistaken to indicate a range.
Yes, so adding more extra characters must be like this
Code:
grep -v '[^[:alnum:].-]' testfile
or this
Code:
grep -v '[^.[:alnum:]-]' testfile
Actually it is possible to have the - character first (after the ^ of course), but there are other characters like a ] that must be first, so it is better to remember "- must be last".
Please promptly ignore any sarcasm
It wasn't ignored but mistakenly it wasn't working as I was just using [[:alnum:]] hence was showing different results.
That worked perfectly fine; however what I am trying to match here is and not sure if this can be achieved in the same line.
Since the above pattern is catching single dot as liternal and hyphen. Being a domain name those will be surrounded by alnum hence trying hard for validation to match . and - only if surrounded by \wfollowed by those two literals.
The most important thing you are missing is to provide a representative sample file and the output you expect.
You have been presented with a solution that works for the sample file you provided. Now you are telling us that the sample file is not representing the actual input data, thus the solution is inappropriate. It is pointless to provide you with a solution if you keep changing the requirement.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.