LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 05-03-2007, 10:09 AM   #1
dedec0
Member
 
Registered: May 2007
Posts: 42

Rep: Reputation: 3
Problems using regex 'es: regcomp/regexec


Hello,

All information I know about reg* functions is from manual pages regex(3) and regex(7).

I want to recognize a line of an assembly program. Generally:

[ [label] : ] [ instruction [argument] ] [ ; [comment] ]

Spaces can be anywhere in between, the expression I repeat all over for this is "[[:space:]]*".

I want the regex to separate for me all this four members: the label, the instruction, the argument and the comment. The former 3 are single words ONLY (alphanum and "_"), and the comment is everything after the ";" (easy enough).

My candidate regex (forgetting spaces), in C string, for this is
Code:
   "^"
   "[[:space:]]*"
   "([[:alpha:]_][[:alnum:]_]*):"       // Label
   "[[:space:]]*"
   "([a-z]+)"                           // Instruction
   "[[:space:]]*"
   "([[:alnum:]_+-]+)"                  // Argument
   "[[:space:]]*"
   "(;.+)$"
But the members are optional. I first try to change the label part to:
Code:
   "(([[:alpha:]_][[:alnum:]_]*):)?"       // Label
And then it stops working. If I remove the label from the processed string the instruction, operator and label arenīt yielded in the results.

The string I am using as bases for the tests is:
"\t \t id0_ \t : \t load \t a32_ \t ; first comment \n"

If I wanīt clear in some aspect, tell me.

Any help? I thought about non-capture ()'s, but the man pages didnīt mention them. These functions support it?

Thank you.

Dedec0
 
Old 05-03-2007, 04:20 PM   #2
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
I don't know what else is wrong, but you're using [] in your regular expressions to indicate optional stuff. [] is used in regular expressions to indicate alternative characters. For example, g[aou]y will match gay, goy, or guy. According to man 7 regex, you can specify something to be optional by placing either ? or {0,1} after it. For example, aeio?u will match aeiu or aeiou, and a(eio)?u will match either au or aeiou.

Hope this helps.
 
Old 05-04-2007, 10:54 AM   #3
dedec0
Member
 
Registered: May 2007
Posts: 42

Original Poster
Rep: Reputation: 3
No. By optional I meant that the person may choose not to have a certain part of the line described.
The complete line is:
Code:
[ [label] : ] [ instruction [argument] ] [ ; [comment] ]
Example:
Code:
MyLabel: burn reg10      ; This will hurt
But one may not want to have the label and the comment, and a valid line the regex should match is:
Code:
    burn reg10
Further, a line may have just the comment:
Code:
; This is a long and painful comment and I do not write more!
See the "optional" I meant? And in all these cases I want the regex to separate the label, instruction, its argument (the "reg10" in above), and the comment. In detail: the regexec would return for me:
1st: "MyLabel", "burn", "reg10" and "; This will hurt", in order
2nd: "", "burn", "reg10" and ""
(or other indication that it failed to match the label and comment parts)
3rd: "", "", "" and "; This is a long and painful comment and I do not write more!"

And by now I canīt make any get optional in that expression. For the label part:

Code:
"([[:alpha:]_][[:alnum:]_]*):"
Simply putting ? after it will make just the caracter ":" optional. So I need to atomize it. I think the way shold be this:
Code:
"(([[:alpha:]_][[:alnum:]_]*):)"
But it wonīt work. I think the problem is on the nested parenthesis, but I havenīt found nothing about this yet.

I hope it is more clear now. Next week I'll put code attatched here (I donīt have it now), but it is really simple: just calling regcomp and then regexes with the described stuff.

Any ideia (or even a "It works for me!", hopefully with concrete example) is very welcome.

Thanks.

Dedec0
 
Old 05-05-2007, 02:48 PM   #4
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 30
You still want the ? for optional parts.

Code:
:alpha:?
makes only the second colon optional, true. But

Code:
(:alpha:)?
makes the whole shootin' match optional.

And no way do square brackets make anything optional. Don't try to use them for that. That's not how standard regular expressions work.
 
Old 06-15-2007, 07:30 AM   #5
dedeco
LQ Newbie
 
Registered: Aug 2005
Posts: 6

Rep: Reputation: 0
Quote:
Originally Posted by wjevans_7d1@yahoo.co
You still want the ? for optional parts.

Code:
:alpha:?
makes only the second colon optional, true. But

Code:
(:alpha:)?
makes the whole shootin' match optional.
Mmmm... you are confusing the meaning of the two regexes:

:alpha: and [[:alpha:]]

In my problem, there is a literal ":", wich separates the label from the rest of the line. And the others are inside brackets, naming character classes (as stated in the manual page regex(7)) I want.

I still want to write a program to ilustrate this thing. I just donīt have the time for it now. But it is all here, for the patient readers.

I didnīt solve the problem yet (I had to make a "mess" around).

Regards
 
Old 06-15-2007, 10:49 AM   #6
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 33
[:alpha:] is a POSIX character class. It matches any alphabetic character defined for your locale. [A-Za-z] is the same thing.

It means there MUST be character, not optional.

Are you using some strange regex engine? Because wjevans is correct for standard versions of regexes.

Consider using lexx/yacc or flex/bison

Or simply tokenize the line, then apply some regex tests to identify which flavors of line objects (lexemes) you found - label, operand, argument_1... argument_N and comment.

If you are using an open source assembler, borrow their lexics.
Almost all compilers have some lex & yacc files in their source.
 
  


Reply

Tags
regex


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed RegEx problems InJesus Programming 6 01-12-2007 11:48 AM
Regex question once again Isotonik Linux - Newbie 2 06-14-2006 02:15 PM
regex help siyisoy Programming 4 04-07-2006 05:32 AM
Regex Help cmfarley19 Programming 5 03-31-2005 10:13 PM
ip address REGEX Robert0380 Programming 16 08-15-2003 01:00 PM


All times are GMT -5. The time now is 02:51 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration