LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-14-2008, 06:57 PM   #1
slackaddict
Member
 
Registered: Nov 2004
Location: Aotearoa
Distribution: Slack, Ubuntu
Posts: 92

Rep: Reputation: 15
Reg EX - Trouble writing regular expression


Hi

I am trying to write a regular expressionas part of a university assignment. It should match "the set of binary strings NOT containing 101 as a substring". (We are using C#)

Can you point me in the right direction (or even just post the complete regular expression)?

Thanks
 
Old 08-14-2008, 07:04 PM   #2
raconteur
Member
 
Registered: Dec 2007
Location: Slightly left of center
Distribution: slackware
Posts: 276
Blog Entries: 2

Rep: Reputation: 44
I don't believe anyone here will do your homework for you... but you can find almost all you ever wanted to know about regular expression syntax here:

http://www.regular-expressions.info/
 
Old 08-14-2008, 08:09 PM   #3
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 63
Here's a hint to get you started. When you have to NOT match something, it is often easier to think in terms of how you WOULD match something. Negating is the easy part.
 
Old 08-14-2008, 11:42 PM   #4
slackaddict
Member
 
Registered: Nov 2004
Location: Aotearoa
Distribution: Slack, Ubuntu
Posts: 92

Original Poster
Rep: Reputation: 15
I think I have it...

In C# I am using: ^[01]((1?)+(00+)?)+(0?)+$

I've tested it with lots of different strings and it seems to be working.

Please let me know what you think about this solution.
 
Old 08-15-2008, 01:55 AM   #5
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 63
This seems rather complicated. If you wanted to find the word "foo" anywhere in a string, what would you search for?
 
Old 08-15-2008, 04:54 AM   #6
slackaddict
Member
 
Registered: Nov 2004
Location: Aotearoa
Distribution: Slack, Ubuntu
Posts: 92

Original Poster
Rep: Reputation: 15
Smile

It seems rather complicated to me too. It had me cursing at the computer for a while :-)

I've just started learning regex. With regex in C# (don't know about any other languages) a simple "foo" would match any string containing "foo" (because we are not specifying the start and end of the string, so there could be any number of characters either side).

I don't know any way to negate a continuous string of characters. [^f] would negate a single f, but I don't think I can use [^foo] (it would negate an f OR an o OR an o again, which means I couldn't have any f or o in my string).

I'm happy to send the assignment in as is, it seems to work.

Are you sure there is a more simple expression? (You don't have to tell me exactly what it is.) Anyone?
 
Old 08-15-2008, 05:37 AM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
I want this thread framed---to illustrate the RIGHT way to ask homework questions!!!

Code:
[root@localhost mherring]# cat bin
10010100101101
10000
000010
101010
01110

[root@localhost mherring]# sed -n '/101/p' bin
10010100101101
101010
[root@localhost mherring]# sed -n '/101/!p' bin
10000
000010
01110
I generated the file "bin" and then ran the two sed commands.

#1: Print every line which contains "101" at least once.

#2: Print every line which does NOT contain "101".
 
Old 08-15-2008, 05:59 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Of course those are char strings, not binary "strings" - but let's not quibble ...
 
Old 08-15-2008, 06:16 AM   #9
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Well, gosh.....
In OP's solution, he appears to be operating on the characters that represent the bits. I don't know how to apply Regexes to the actual bits.

I quibble for a living......
 
Old 08-15-2008, 12:01 PM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
If the assignment prohibits using C# code to negate the condition, the answer won't look pretty, regexp's don't negate well. That said you could shorten it a little: replace (X+)? with X*. I will also note that your expression doesn't match the empty string...
 
Old 08-15-2008, 04:19 PM   #11
slackaddict
Member
 
Registered: Nov 2004
Location: Aotearoa
Distribution: Slack, Ubuntu
Posts: 92

Original Poster
Rep: Reputation: 15
Yeah, we can't use C# code to match the strings, just regex.

You're right about the empty string. To match the empty string I could just take the "[01]" away from the start. The assignment asks to match "the set of binary strings....". Do you think that includes the empty string? Maybe it depends on who is marking the assignment.

I shortened the regex to: ^[01](1*(00+)*)+0*$

I think its working. For some reason I thought C# couldn't handle the "*" ...
 
Old 08-15-2008, 05:40 PM   #12
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 63
Since C# uses PCRE, you can use lookaround assertions:

Code:
$ cat bin
10010100101101
10000
000010
101010

01110

# this shows our correct results
$ grep -v 101 bin
10000
000010

01110

# correct results using negative lookahead
$ perl -ne 'print if /^([01](?!101))*$/' bin
10000
000010

01110
 
Old 08-15-2008, 06:35 PM   #13
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
"the set of binary strings....". Do you think that includes the empty string?
I'd be surprised if it didn't.

@Mr. C. nice trick with the lookahead. I wasn't aware of PCRE's features, feels almost like cheating to me , but there's a case it fails on:
Code:
~/tmp$ echo 101 | grep -v 101
~/tmp$ echo 101 | perl -ne 'print if /^([01](?!101))*$/'
101

Last edited by ntubski; 08-15-2008 at 06:40 PM. Reason: simplified failing case
 
Old 08-15-2008, 07:05 PM   #14
Mr. C.
Senior Member
 
Registered: Jun 2008
Posts: 2,529

Rep: Reputation: 63
Oops! Just checking to see who was watching! :-)

How about negative lookbehind instead:

Code:
$ cat bin
101
10010100101101
10000
000010
101010

01110
0
00
000
0000
0001

$ grep -v 101 bin 
10000
000010

01110
0
00
000
0000
0001

$  perl -ne 'print if /^((?<!10)[1]|0)*$/'  bin          
10000
000010

01110
0
00
000
0000
0001
 
Old 08-15-2008, 08:07 PM   #15
slackaddict
Member
 
Registered: Nov 2004
Location: Aotearoa
Distribution: Slack, Ubuntu
Posts: 92

Original Poster
Rep: Reputation: 15
Thumbs up

Cool. I'm glad this thread is generating a bit of activity, getting people's brains going.

All this PCRE, lookahead, negative lookbehind etc is a bit over my head ;-)

I'm going to use: ^(1*(00+)*)+0*$ and add a note mentioning the empty string.

Thanks everyone for your input. In the end I'm glad no one just gave me the solution.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
having problem in writing regular expression in tcl mohtasham1983 Programming 1 10-29-2006 01:29 PM
finding an offset directory with reg expression airswit Programming 1 02-15-2006 12:52 AM
Trouble using a regular expression with CentOS spiffytech Linux - Software 2 12-22-2005 11:17 AM
Reg Expression Question windisch Programming 6 10-04-2005 11:08 AM
Having trouble with a perl regular expression... jayemef Programming 3 08-25-2005 11:00 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration