LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   regular expression problem (https://www.linuxquestions.org/questions/programming-9/regular-expression-problem-731384/)

sancho1980 06-08-2009 04:31 AM

regular expression problem
 
hi

i am using regcomp and regexec to find out whether a string is a valid host name
a valid host name (according) to wikipedia is anything that
-starts with a-z or A-Z
-followed by 0 or more of a-z, A-Z, 0-9 or '-'
-and ends with a-z, A-Z or 0-9

the regular expression i use is this one

#define HOSTNAMEREGEX "^([a-zA-Z])|([a-z0-9A-Z-])*|([a-z0-9A-Z])$"

But strangely enough, this also matches strings like

"a-" and even "a?"

whats wrong with this?

thanks

martin

david1941 06-08-2009 04:48 AM

Well, a- and a ARE valid hostnames. It appears little is worng with it.

Dave

sancho1980 06-08-2009 05:05 AM

Even if "a-" WAS a valid host name (which I doubt), then your answer still misses my point: The core of my question was "why does the above regex match something like 'a-' and 'a?'"

david1941 06-08-2009 05:11 AM

Code:

"^([a-zA-Z])|([a-z0-9A-Z-])*|([a-z0-9A-Z])$"
It matches the second alternation, ([a-z0-9A-Z-])*

Dave

sancho1980 06-08-2009 05:20 AM

my regex was indeed wrong, but i think so was your answer
i really meant the following regex:

#define HOSTNAMEREGEX \
"^([a-zA-Z])|(([a-zA-Z])([a-z0-9A-Z-])*([a-z0-9A-Z]))$"

this clearly has 2 alternatives:

1) ([a-zA-Z])..any ONE letter out of a-z or A-Z
2) (([a-zA-Z])([a-z0-9A-Z-])*([a-z0-9A-Z]))..any ONE letter out of a-z or A-Z followed by 0 or more out of [a-z0-9A-Z-] and ENDING WITH ANY ONE out of [a-z0-9A-Z]

my understanding is this CANNOT possibly match anything ending with "-", let alone "?" or any other special character...BUT IT DOES!!...WHY?

get my point?

david1941 06-08-2009 05:40 AM

the ([a-z0-9A-Z-])* should have the final - first in the character class, or else it is a metacharacter. Try it like this: ([-a-z0-9A-Z])*

Dave

sancho1980 06-08-2009 05:45 AM

umm no, that didnt do the trick :-(

pixellany 06-08-2009 07:15 AM

These do exactly the same thing:
Code:

([a-z-])
([-a-z])

To wit: match any character in the range a-z, OR a literal "-"

druuna 06-08-2009 07:21 AM

@pixellany:
Quote:

These do exactly the same thing:
([a-z-])
([-a-z])
Theoretically you are correct, but the first one ([a-z-]) can fail depending on program/version used and/or *nix flavor that is used (error being: A closing character is expected after z-).

If you want to make sure no error will occur use the second one ([-a-z]).

PMP 06-08-2009 07:22 AM

Try using this

"^([a-zA-A])([a-z0-9A-Z-])*([a-z0-9A-Z])$"

PMP 06-08-2009 07:26 AM

I Guess this one should work

"^([a-zA-A])([a-z0-9A-Z])*(-[a-z0-9A-Z])?$"


All times are GMT -5. The time now is 03:26 AM.