Reg EX - Trouble writing regular expression
Hi
I am trying to write a regular expressionas part of a university assignment. It should match "the set of binary strings NOT containing 101 as a substring". (We are using C#) Can you point me in the right direction (or even just post the complete regular expression)? Thanks |
I don't believe anyone here will do your homework for you... but you can find almost all you ever wanted to know about regular expression syntax here:
http://www.regular-expressions.info/ |
Here's a hint to get you started. When you have to NOT match something, it is often easier to think in terms of how you WOULD match something. Negating is the easy part.
|
I think I have it...
In C# I am using: ^[01]((1?)+(00+)?)+(0?)+$ I've tested it with lots of different strings and it seems to be working. Please let me know what you think about this solution. |
This seems rather complicated. If you wanted to find the word "foo" anywhere in a string, what would you search for?
|
It seems rather complicated to me too. It had me cursing at the computer for a while :-)
I've just started learning regex. With regex in C# (don't know about any other languages) a simple "foo" would match any string containing "foo" (because we are not specifying the start and end of the string, so there could be any number of characters either side). I don't know any way to negate a continuous string of characters. [^f] would negate a single f, but I don't think I can use [^foo] (it would negate an f OR an o OR an o again, which means I couldn't have any f or o in my string). I'm happy to send the assignment in as is, it seems to work. Are you sure there is a more simple expression? (You don't have to tell me exactly what it is.) Anyone? |
I want this thread framed---to illustrate the RIGHT way to ask homework questions!!!
Code:
[root@localhost mherring]# cat bin #1: Print every line which contains "101" at least once. #2: Print every line which does NOT contain "101". |
Of course those are char strings, not binary "strings" - but let's not quibble ... :p
|
Well, gosh.....
In OP's solution, he appears to be operating on the characters that represent the bits. I don't know how to apply Regexes to the actual bits. I quibble for a living......;);) |
If the assignment prohibits using C# code to negate the condition, the answer won't look pretty, regexp's don't negate well. That said you could shorten it a little: replace (X+)? with X*. I will also note that your expression doesn't match the empty string...
|
Yeah, we can't use C# code to match the strings, just regex.
You're right about the empty string. To match the empty string I could just take the "[01]" away from the start. The assignment asks to match "the set of binary strings....". Do you think that includes the empty string? Maybe it depends on who is marking the assignment. I shortened the regex to: ^[01](1*(00+)*)+0*$ I think its working. For some reason I thought C# couldn't handle the "*" ... |
Since C# uses PCRE, you can use lookaround assertions:
Code:
$ cat bin |
Quote:
@Mr. C. nice trick with the lookahead. I wasn't aware of PCRE's features, feels almost like cheating to me :p, but there's a case it fails on: Code:
~/tmp$ echo 101 | grep -v 101 |
Oops! Just checking to see who was watching! :-)
How about negative lookbehind instead: Code:
$ cat bin |
Cool. I'm glad this thread is generating a bit of activity, getting people's brains going.
All this PCRE, lookahead, negative lookbehind etc is a bit over my head ;-) I'm going to use: ^(1*(00+)*)+0*$ and add a note mentioning the empty string. Thanks everyone for your input. In the end I'm glad no one just gave me the solution. |
All times are GMT -5. The time now is 01:17 AM. |