LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Reg EX - Trouble writing regular expression (https://www.linuxquestions.org/questions/programming-9/reg-ex-trouble-writing-regular-expression-662875/)

slackaddict 08-14-2008 06:57 PM

Reg EX - Trouble writing regular expression
 
Hi

I am trying to write a regular expressionas part of a university assignment. It should match "the set of binary strings NOT containing 101 as a substring". (We are using C#)

Can you point me in the right direction (or even just post the complete regular expression)?

Thanks

raconteur 08-14-2008 07:04 PM

I don't believe anyone here will do your homework for you... but you can find almost all you ever wanted to know about regular expression syntax here:

http://www.regular-expressions.info/

Mr. C. 08-14-2008 08:09 PM

Here's a hint to get you started. When you have to NOT match something, it is often easier to think in terms of how you WOULD match something. Negating is the easy part.

slackaddict 08-14-2008 11:42 PM

I think I have it...

In C# I am using: ^[01]((1?)+(00+)?)+(0?)+$

I've tested it with lots of different strings and it seems to be working.

Please let me know what you think about this solution.

Mr. C. 08-15-2008 01:55 AM

This seems rather complicated. If you wanted to find the word "foo" anywhere in a string, what would you search for?

slackaddict 08-15-2008 04:54 AM

It seems rather complicated to me too. It had me cursing at the computer for a while :-)

I've just started learning regex. With regex in C# (don't know about any other languages) a simple "foo" would match any string containing "foo" (because we are not specifying the start and end of the string, so there could be any number of characters either side).

I don't know any way to negate a continuous string of characters. [^f] would negate a single f, but I don't think I can use [^foo] (it would negate an f OR an o OR an o again, which means I couldn't have any f or o in my string).

I'm happy to send the assignment in as is, it seems to work.

Are you sure there is a more simple expression? (You don't have to tell me exactly what it is.) Anyone?

pixellany 08-15-2008 05:37 AM

I want this thread framed---to illustrate the RIGHT way to ask homework questions!!!

Code:

[root@localhost mherring]# cat bin
10010100101101
10000
000010
101010
01110

[root@localhost mherring]# sed -n '/101/p' bin
10010100101101
101010
[root@localhost mherring]# sed -n '/101/!p' bin
10000
000010
01110

I generated the file "bin" and then ran the two sed commands.

#1: Print every line which contains "101" at least once.

#2: Print every line which does NOT contain "101".

syg00 08-15-2008 05:59 AM

Of course those are char strings, not binary "strings" - but let's not quibble ... :p

pixellany 08-15-2008 06:16 AM

Well, gosh.....
In OP's solution, he appears to be operating on the characters that represent the bits. I don't know how to apply Regexes to the actual bits.

I quibble for a living......;);)

ntubski 08-15-2008 12:01 PM

If the assignment prohibits using C# code to negate the condition, the answer won't look pretty, regexp's don't negate well. That said you could shorten it a little: replace (X+)? with X*. I will also note that your expression doesn't match the empty string...

slackaddict 08-15-2008 04:19 PM

Yeah, we can't use C# code to match the strings, just regex.

You're right about the empty string. To match the empty string I could just take the "[01]" away from the start. The assignment asks to match "the set of binary strings....". Do you think that includes the empty string? Maybe it depends on who is marking the assignment.

I shortened the regex to: ^[01](1*(00+)*)+0*$

I think its working. For some reason I thought C# couldn't handle the "*" ...

Mr. C. 08-15-2008 05:40 PM

Since C# uses PCRE, you can use lookaround assertions:

Code:

$ cat bin
10010100101101
10000
000010
101010

01110

# this shows our correct results
$ grep -v 101 bin
10000
000010

01110

# correct results using negative lookahead
$ perl -ne 'print if /^([01](?!101))*$/' bin
10000
000010

01110


ntubski 08-15-2008 06:35 PM

Quote:

"the set of binary strings....". Do you think that includes the empty string?
I'd be surprised if it didn't.

@Mr. C. nice trick with the lookahead. I wasn't aware of PCRE's features, feels almost like cheating to me :p, but there's a case it fails on:
Code:

~/tmp$ echo 101 | grep -v 101
~/tmp$ echo 101 | perl -ne 'print if /^([01](?!101))*$/'
101


Mr. C. 08-15-2008 07:05 PM

Oops! Just checking to see who was watching! :-)

How about negative lookbehind instead:

Code:

$ cat bin
101
10010100101101
10000
000010
101010

01110
0
00
000
0000
0001

$ grep -v 101 bin
10000
000010

01110
0
00
000
0000
0001

$  perl -ne 'print if /^((?<!10)[1]|0)*$/'  bin         
10000
000010

01110
0
00
000
0000
0001


slackaddict 08-15-2008 08:07 PM

Cool. I'm glad this thread is generating a bit of activity, getting people's brains going.

All this PCRE, lookahead, negative lookbehind etc is a bit over my head ;-)

I'm going to use: ^(1*(00+)*)+0*$ and add a note mentioning the empty string.

Thanks everyone for your input. In the end I'm glad no one just gave me the solution.


All times are GMT -5. The time now is 01:17 AM.