LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-24-2013, 04:30 PM   #1
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Rep: Reputation: Disabled
regular expressions


Is there a program for which you enter a regular expression and it tells you in english what it will perform?
 
Old 09-24-2013, 04:58 PM   #2
sycamorex
LQ Veteran
 
Registered: Nov 2005
Location: London
Distribution: Slackware64-current
Posts: 5,811
Blog Entries: 1

Rep: Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191Reputation: 1191
Perhaps, something like that would help:

http://regex101.com/
or

http://www.myezapp.com/apps/dev/regexp/show.ws
 
3 members found this post helpful.
Old 09-25-2013, 01:11 AM   #3
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
Thanks sycamorex! The myezapp site did the trick and gave me what I wanted. They did not have flex but the java version seems to work identical. Thanks again! Alvin....
 
1 members found this post helpful.
Old 09-25-2013, 02:10 AM   #4
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
I am trying to get a label that is defined as starting with a letter then following zero to 7 letters or numbers followed by a colon. I would also like it to be the first item in the statement. I tried:

lines ::= labeldef statement
labeldef ::= [A-Za-z][A-Za-z0-9]{0,7}:

but it allows labels longer than 8 characters, matching only the last 8 chars before the colon plus the colon. How do I write the reg expression to allow only eight chars total and give an error for 9 or more characters (or not starting with a letter)? Maybe I need to precede it with a null or get it to start at the first character on the line. Thank you. Alvin.... (note I am not in college doing this for a course but for my business).
 
Old 09-25-2013, 03:34 AM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,471

Rep: Reputation: 444Reputation: 444Reputation: 444Reputation: 444Reputation: 444
it appears fine here, at least in bash

Code:
Check=""
for i in a {1..10};do
  Check=${Check%:}${i}:
  printf "$Check "
  [[ $Check =~ [A-Za-z][A-Za-z0-9]{0,7}: ]] \
    && echo True \
    || echo False
done
gets the following output

Code:
a: True
a1: True
a12: True
a123: True
a1234: True
a12345: True
a123456: True
a1234567: True
a12345678: False
a123456789: False
a12345678910: False
 
Old 09-25-2013, 03:52 AM   #6
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,104

Rep: Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267Reputation: 2267
starting means ^, so you would need to write: ^[A-Za-z][A-Za-z0-9]{0,7}:
 
2 members found this post helpful.
Old 09-25-2013, 04:11 AM   #7
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,471

Rep: Reputation: 444Reputation: 444Reputation: 444Reputation: 444Reputation: 444
good point pan64

Still, it should work without ^
Code:
Check="";for i in 0 {1..10};do   Check=${Check%:}${i}:;   printf "$Check ";   [[ $Check =~ [A-Za-z][A-Za-z0-9]{0,7}: ]]     && echo True     || echo False; done
results in all False
But 100% agree, ^ should be used, along with $ on the end
Code:
 Check="";for i in z {1..10};do   Check=${Check%:}${i}:;   printf "$Check ";   [[ $Check =~ ^[A-Za-z][A-Za-z0-9]{0,7}:$ ]]     && echo True     || echo False; done

As I mentioned earlier,. your regexpr. is working in bash..

Do you have sample code where it is not working?

Last edited by Firerat; 09-25-2013 at 04:12 AM.
 
Old 09-25-2013, 03:59 PM   #8
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
I was using the software at:

http://regex101.com/
or

http://www.myezapp.com/apps/dev/regexp/show.ws

to test with. The myezapp program shows the match with a colored bar. For "testabc0:" all was colored. for "testabc12:" "stabc12:" was colored as a match. The label needs to be at the beginning of the line so I will use the ^ first. Other tokens can follow so I will leave off the $. I just tried it with http-//regex101.com and now it seems to work correctly. I have: ^[A-Za-z][A-Za-z0-9]{0,7}: which now seems to work rejecting "testabc01:" and " testabc0:" (not first in the line.) Thank you for your help. Do you know if flex will reject it and just pass the no match through or will it give me some way to flag it as an error? Thanks. Alvin...
 
Old 09-25-2013, 04:34 PM   #9
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,600

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
Quote:
Originally Posted by Firerat View Post
good point pan64

Still, it should work without ^
No, it should work exactly as shown: aaab0123456 would match "b0123456"... even though it is preceded by aaa string. Only by giving the ^ does it specify that it match from the beginning of the string.

One other note - it is part of a flex scanner/tokenizer. Now specifying the ^ will identify it as valid, but usually this would be counted as a SEMANTIC error, rather than a syntax error. Leaving the ^ off would allow the action part to make more detailed analysis and determine the difference between a valid label, and a label that is too long, and thus provide better error diagnostics for the user to be able to make corrections faster, and more accurately.
 
Old 09-25-2013, 07:00 PM   #10
Habitual
LQ Addict
 
Registered: Jan 2011
Location: Youngstown, Ohio
Distribution: LM17.1/Xfce4.11.8
Posts: 7,159
Blog Entries: 10

Rep: Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968Reputation: 1968
also http://www.gskinner.com/RegExr/
 
Old 09-26-2013, 09:22 PM   #11
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
Should I leave the ^ in or out? I changed mine to ^[A-Za-z][A-Za-z0-9]{0.7}[ \t\n] to catch a blank or tab between the labeldef and the next token on the line or just the labeldef on the line. But how will I catch a label too long as it will probably just pass through if flex works as I think.
 
Old 09-27-2013, 04:45 AM   #12
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,600

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
It depends on the grammar. Don't forget that a scanner is only supposed to recognize tokens. If the grammar uses white space for a token or just a separator is two different things.

The scanner can easily consume white space if it isn't significant with a very simple rule. For instance. If the grammar specifies a label as:

Code:
label : symbol ':' {whatever to do with a label}
      .
Then the scanner only has to identify what a symbol is. Length of a symbol is not really relevant to the grammar. The action part of the grammar can look at the length and decide if it is too long, report an error (label length too long) that is specific to the label.

If all symbols are limited then the scanner can identify the error, but still not abort scanning - translators work best by identifying as many errors as possible, and not terminate on the first one.

The scanner could identify the label with:
Code:
id [A-Za-z]{1,}[0-9]*
ws [ \t]
nl '\n'
coln ':'
%%
id    {return ID};
coln  {return COLON};
..... /* other tokens */
ws   ;   /* discard */
nl    {linecount++;};
Now the whitespace is discarded - but the newline is checked for to maintain a line count for error messages. This allows the grammar to identify whether something is a label or not with:

Code:
label:   ID COLON    { if ($1.length > 8) {
                            /* print error message with linecount */
                            errorcount++;
                       }
                       /* do other stuff with label - update symbol table... */
                       
                      }
It really depends on how you decide to handle things, and how complex the grammar is. Scanners generated by flex are ok - though sometimes they are not terribly clear and sometimes it is easier to create one by hand (especially simple ones).
Flex is good to use when speed of implementation is more important, but it does assume you are already familiar with what/how scanners are used and if you are interfacing it with bison (as in generating the appropriate include files...)

They are supposed to make it easier for the parser to handle things and separate semantics from tokenizing - and help make error messages more meaningful and parsing recovery possible. The simplest error handling is to abort on the first error - but that makes USING the application/translator/... much harder as you have to keep re-running the application just to find the next error.

Last edited by jpollard; 09-27-2013 at 05:15 AM. Reason: incomplete. got submitted by accident.
 
Old 09-28-2013, 02:49 PM   #13
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
Thanks jpollard. I really appreciate the extra effort you took to make those points. Alvin...
 
Old 09-30-2013, 05:14 PM   #14
schmitta
Member
 
Registered: May 2011
Location: Blacksburg VA
Distribution: UBUNTU, LXLE
Posts: 308

Original Poster
Rep: Reputation: Disabled
I am writing a BASIC interpreter. I was going to write the bnf and run it through flex and bison but I am not sure they would be appropriate for writing an interpreter with. The original idea was to generate the interpreter with flex and bison and compile the C code to run in a MCU. The MCU has 170k words of flash and 56kb of ram in a harvard architecture. But I am considering using flex and bison to write a c program that would run on a PC and generate an intermediate psudo assembler language to be interpreted on the mcu. I am including DCL (declare) statements in the BASIC which I would like to find on an initial pass through the code. Pass zero would also find forward branches and the WEND in a WHILE WEND statements. Can multiple passes be done with flex and bison or is there a better way? The PC program would be written in JAVA so as to run under Windows, MAC and Linux systems. Any ideas you have would be greatly appreciated.
 
Old 09-30-2013, 05:54 PM   #15
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,600

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
The first pass of a compiler translates the source into something more amenable to analysis. This is the intermediate language that can be in one of many forms - a parse tree plus symbol table, or multiple parse trees and symbol tables (this is the one I'm most familiar with, but there are others). Consider the inclusion of subroutines/functions for instance. The ability to include "precompiled" parse trees (or whatever the intermediate language is) allows for global optimization of the code. After the optimization pass, a third pass can be made that generates the optimized assembler.

You might want to look into the LLVM/Clang compilers (http://llvm.org/) - they are designed for this.

Another back end (I have only read about, not used) is what is used for Android - the Dalvic bytecode interpreter. This is supposed to provide an efficient interpreter with a minimum of actual code, but allows a higher level language to be converted into a smaller size (good), though it is slower than native code (bad), it is MUCH easier to generate good code for... And allows the code to run on anything that the interpreter runs on (making it easier to develop for).

This is also what the goal for the Forth language (http://www.forth.org/) - a very small interpreter, with an easily parsed language to run on very small processors.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions nova49 Linux - Newbie 4 07-13-2011 08:05 AM
Regular Expressions Wim Sturkenboom Programming 10 11-19-2009 02:21 AM
regular expressions. stomach Linux - Software 1 02-10-2006 07:41 AM
Regular Expressions overbored Linux - Software 3 06-24-2004 03:34 PM
help with REGULAR EXPRESSIONS ner Linux - General 23 11-01-2003 12:09 AM


All times are GMT -5. The time now is 02:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration