regular expressions

schmitta · 10-30-2013, 06:08 AM

How is your system coming along?

jpollard · 10-30-2013, 07:25 AM

Not bad.

I have a working assembler (some bugs do remain though).

I have an initial stack machine with MOST of the instructions implemented (though not all are tested yet - it has basic arithmetic (but without overflow/carry), branches (but not overflow/carry), a system call for doing basic output (stdout/stderr, stdin planned but not implemented yet, with a basic formatted output), and a stack/register dump. Hopefully to get some documentation done (actually just filling out a table of contents). Still to add indexing instructions too, though these should be relatively simple.

I'm still torn by subroutine call/return... And am thinking of doing two - a "heavy weight" call that saves registers, and sets a stack frame pointer with a corresponding return, and a lightweight call/return that only puts the PC on the stack. The aim of the heavyweight call/return is high level subroutines, the lightweight one more for things like intrinsics - maybe using a standardized index computation type of thing.

There are a number of unused opcodes still (about 50). I may renumber some of the instructions (there are 8 register save/restore instructions I might move to put with the indexing instructions as it just seems to make sense).

Basic architecture:
3 segments,byte addressable code, 4 byte (32 bit entries) addressable stack, byte addressable data space.

2 data indexing registers, 2 stack indexing registers (a frame pointer and the stack pointer). Only the PC for instruction addressing (makes jump tables a pain).

There are two ways to do jump tables:
- a lightweight subroutine call, then add to the return address, and return (not that great, but works fairly well),
- push an address of a jump table in data space, use indexing to get the actual jump address on the stack, exchange top two elements, drop the top one, and then do a lightweight subroutine return...

It might even end up being reasonable to add an instruction just for jump tables as that would allow them to exist in the code segment (much better reliability that way - the method would be putting the index on the stack, and the instruction would do a jump indirect adding the top entry of the stack - shifted by 1 bit for a 16 bit offset - to the address given to the jump, and the contents of that code segment address would be copied to the program counter, and remove the index from the stack). This is simpler to do, and since it would be implemented by the machine, faster than any other method. But it is something that can be left for later.

The assembler limitations are that it doesn't always identify undefined instructions properly... (I have to fix the perl pattern match-a better one combines assembler directives with instructions, and provides for better error detection). Because the assembler was pre-existing, I have macro definitions included, with conditional assembly. A data definition (.byte/.short/.int/.float to set values, and a .block for buffer allocations), a segment selection (.code/.data), and it is possible to switch between the two.

Once the stack machine is "relatively" finished I'll fix the assembler and get back to a code generator (initially generating just assembler output so I can eyeball it - and pass it through the assembler and on to the machine for testing). That will get the basic translator working, but without error diagnostics.

How has your side been going?

schmitta · 10-30-2013, 10:42 AM

I was blessed to find a good hand written TINY BASIC in C.(BAS-INT.c). I have added a state machine parser for floating point, signed int, hex and binary constants. Wrote a delimiter routine as a state machine for parsing out and identifying +=, -=, <<=, etc. Wrote a symbol table set of routines. Am using num for floating point, num# for 32 bit signed ints, num$ for strings and a DIM statement for up to 3 indexes for arrays. Will enter the code written today and probably tomorrow start on the expression analyzer. Found an algorithm for reading algebraic expressions and then using a stack and queue for converting to reversed polish notation which then can be executed. It has been surprisingly easy to do this with the help of the C skeleton. I am not the swiftest person in the world but even so things seem to be working out. Started out with a minor change to the tokenizer and things just flowed from that. Would like to see your system when you are at a stopping point unless you are going to keep it private. May some day come up with a stack machine and the associated BISON/FLEX compiler. The BNF for my language follows:

Quote:

<pgm> ::= <beg> <lines> <end> \0
;

<beg> ::= PSTART
;

<end> ::= PEND
;

<lines> ::= <line>
| <lines> <line>
;

<line> ::= <labeldef> : <statements> <eol>
| <statements> <eol>
| <labeldef> : <eol>
;

<eol> ::= \r
| \n
| \r\n
;

<statements>::= <statement>
| <statements> <statement>
;

<statement> ::= <var> <eqop> <exp>
| PRINT <pg>
| INPUT <varlist>
| GOTO <label>
| GOSUB <label>
| EXIT
| IF <exp> EXIT
| IF <exp> GOTO <label>
| IF <exp> THEN <statements> ENDIF
| IF <exp> THEN <statements> ELSE <statements> ENDIF
| RETURN
| END
| WHILE <exp> <statements> WEND
| REPEAT <statements> UNTIL <exp>
| FOR <var> EQ <exp> TO <exp>
| FOR <var> EQ <exp> TO <exp> STEP <exp>
| NEXT
| NEXT <var>
;

<eqop> ::= =
| +=
| -=
| *=
| /=
| %=
| ^=
| &=
| |=
| **=
| <<=
| >>=
;

<varlist>::= <var>
| <varlist> , <var>
;

<var> ::= <id>
| <id> [ <exp> ]
| <id> [ <exp> , <exp> ]
| <id> [ <exp> , <exp> , <exp> ]
;

<pg> ::= <var>
| <s>
| <pg> , <var>
| <pg> , <s>
;

<s> ::= " <chars> "

<exp> ::= <logical_and>
| <log_or> || <logical_and>
;

<logical_and>::= <inc_or>
| <logical_and> && <inc_or>
;

<inc_or>::= <xor>
| <inc_or> | <xor>
;

<xor> ::= <and_exp>
| <xor> ^ <and_exp>
;

<and_exp>::= <equal_exp>
| <and_exp> & <equal_exp>
;

<equal_exp>::= <relation_exp>
| <equal_exp> == <relation_exp>
| <equal_exp> != <relation_exp>
;

<relation_exp>::= <shift_exp>
| <relation_exp> < <shift_exp>
| <relation_exp> > <shift_exp>
| <relation_exp> <= <shift_exp>
| <relation_exp> >= <shift_exp>
;

<shift_exp>::= <add_exp>
| <shift_exp> << <add_exp>
| <shift_exp> >> <add_exp>
;

<add_exp>::= <mult_exp>
| <add_exp> + <mult_exp>
| <add_exp> - <mult_exp>
;

<mult_exp>::= <unary_exp>
| <mult_exp> * <unary_exp>
| <mult_exp> / <unary_exp>
| <mult_exp> % <unary_exp>
;

<unary_exp>::= <postfix_exp>
| ++ <unary_exp>
| -- <unary_exp>
;

<postfix_exp>::= <prim_exp>
| <postfix_exp> ++
| <postfix_exp> --
;

<prim_exp>::= <var>
| <constant>
| ( <exp> )
;

<constant>::= BINARY
| HEX
| SIGNED_DEC_32_BIT
| <s>
;

<chars> ::= ALL CHARACTERS EXCEPT FOR \0. " IS REPRESENTED AS ""

<id> ::= starts with letter or _ followed by 0 up to 7 letter or number or _

<label> ::= <id>

<labeldef>::= <id> that starts in column 1

jpollard · 11-03-2013, 05:52 AM

I have sent you some source code as my project now stands. It is very early and not validated.

Unfortunately, this forum doesn't allow tar/gziped tar files as attachments.

jpollard · 11-05-2013, 07:08 PM

some updates - A number of bugs have been identified and fixed, an overlooked instruction added, and finally identified a reason to add condition code manipulation and overflow/carry instructions even if they are not fully supported...

schmitta · 11-05-2013, 11:07 PM

What mcu do you want to use it on?

jpollard · 11-06-2013, 02:55 AM

Quote:

Originally Posted by schmitta

What mcu do you want to use it on?

As it is right now, it is only for debugging - specifically because of the use the SYSTEM instruction, and the framing main program. Both are only for debugging the svm itself, and providing a trivial interface for loading applications to be interpreted (allocating the code/stack/data segments for instance - that would be replaced by arrays created for the specific target). The only architectural limitations are the 16 bit limits on the code/data segments. The instructions are all limited by that definition (only one and two byte offsets available). The stack is not so limited, as nearly everything there is indexed by 32 bit integers, and the stack is defined to be an array of unsigned int.

This is at the same level as the JVM - the most common JVM is NOT the one used on embedded devices - the core of those are very likely written in assembler, or when they are written in C the processor is a 32 bit CPU with a good bit of memory, not a 16 bit machine. For something that small, there is microJava (http://hackaday.com/2012/10/15/%CE%B...rocontrollers/ and the project itself at http://dmitry.gr/index.php?r=05.Proj...%20micro%20JVM) but you have to pick what part of java to leave out if it is too large - microjava complete is about 60k on PIC. It would make it unnecessary to develop a translator, as there is the java compiler itself for cross development.

Porting the stack machine I'm putting together shouldn't be that difficult, though for certain things (specifically the arithmetic handling) it may need some targeted assembly for the overflow/underflow catching as how that gets done depends on the target (and that could also make it smaller and faster, what I finally use to handle overflow and carry detection will not be targeted to any specific processor - and thus will be a bit clumsy and slow). As it is now, the svm itself (and ONLY the svm) is only 18k in size - no code/stack/data segments allocated. I'm sure it should be able to be reduced to about 14k, and because it is a very simplified (no built in structure things like class invocation the JVM has) it ought to get down to about 4K-8K in assembler.

My focus has been on keeping the application code as small as possible without knowing anything detailed about the applications to be run on it. I mention this, because domain specific knowledge would allow for special instructions to be included - a navigation domain could have geodetic computations done as a single opcode for speed, and the programming language would then use it as a
simple operator.. and the application reduced in volume by eliminating any code to perform geodetic computations entirely.

The code used in svm.c uses nothing fancy (the tmp1/tmp2 union structure is the most, I think). There are likely some issues in the case of varying endian handling - there is the conversion to/from byte structures that assumes little endian. (pushing a two byte ascii would swap the bytes in the assembler - but I'm not sure, and instead of using a tmp1.uchars array to arrange the bytes, it would use << and | operators to build multi-byte values for the stack).

There ARE a fair amount of error recovery that can conceivably be removed where the target might make assumptions - a lot of it could just return as an "internal error" issue instead of making individual
error reports. If the application being run on it is debugged on a development system with the full debug reports, the target interpreter could therefore get a good bit smaller (and possibly faster).

The framing main program itself is not intended to be part of that, nor is the system.c function - That is only an example of interfacing between the svm and the outside, and is only one way to do that.

Also there is the nature of traps/interrupts. There is none of that put in there, though it could be (It needs some form of semaphore the stack machine could use to recognize the request, but most of the interrupt code would be external to the svm, and only the svm interface itself would be part of the stack machine).

Hopefully, it shows a "how" it can be done.

schmitta · 11-06-2013, 09:31 AM

It turns out that the microchip PIC24 has a return stack that is settable to any size (not a fixed hardware stack like the smaller chips).

jpollard · 11-06-2013, 12:21 PM

Makes it much better for native applications. Easier to program too.

jpollard · 11-07-2013, 05:41 PM

I've gotten through the first 3 test phases, only one more to go (125 tests so far, I had forgotten how tedious those can get). One more (branch and index tests), then maybe fill out some of the missing instructions (mostly condition code handling, perhaps a memmove instruction) and add tests for those. Then some integration tests (these would be some sample programs, and a bit more fun: programs for the stack machine- square root functions, Fibonacci functions, factorials, some might be useful for performance tests). More bug fixes - mostly simple things now, which is why the validation tests are important.

I may write a perl scanner to look through the svm_opcode.h include file to extract current instruction/opcode definitions. It would simplify keeping the assembler in sync with the stack machine (right now, I have to manually update the assembler each time an error is identified).

schmitta · 11-08-2013, 05:08 AM

I am stupidly just writing code and entering it without testing. Did this once before and it took me 6 months to get the program to work. But it was running on an MCU (PIC) with the only debugging tool being one led to turn on or off depending on the outcome of a test. Where did you go to school and what did you study? As you can tell I am in Blacksburg and went to VA Tech. Class of 1973 in Electrical Engineering. Went for a masters in Computer Science and applications but never finished it.

jpollard · 11-08-2013, 06:11 AM

BS in computer science, minor in math, 1977. Lots of fairly abstract subjects, languages, computer design, digital electronics. Learned several assembler - PDP-10, PDP-8, PDP-11, a number of programming languages, Fortran/Cobol as distinct classes, - PL/1, APL, LISP, Algol, Snobol, Algol-W (translated a scanner from Algol-W into Algol68) in a survey class, pascal, concurrent pascal (a threaded code interpreter on a PDP-11, and the first "managed code" system in spite of Microsofts claims).

Compiler construction before there were parser generators available (wrote one for a senior project- luckily got an A for showing the errors in a text book on language processing -their tables were wrong), Working through school helped - first job was translating IBM 360 code to DEC-10, operator, student help desk, evolved into contract programming for a number of departments). Even designed a multi-threading library for the DEC-10 FORTRAN (ran out of time + they changed the clock trapping in the system just before graduating). It was University of Miss.

First job after graduating was teaching introduction to computers/assembly and introduction to data structures (I carefully stayed away from analysis of algorithms - my math wasn't sufficient, loved the idea of abstract algebra, unfortunately couldn't seem to be able to complete the proofs - but got the idea OF proofs). Second job taught a lot - working on navigation systems for air/sea/land seismic surveys. Worked with the first GPS receivers ever, LORAN-C, a proprietary high resolution microwave system, PDP-8, PDP-11 (talked them into my first UNIX system - a v6 on a PDP-11/23). LOTS of kernel design there - wrote basic network link code for a distributed navigation system (up to 5 nodes - one per ship to coordinate all 5 ships as a unit. It may even have been the first commercial wireless network - though I don't think it had more than 3 ships active)

After that it was software development/operations and system management - UNIX/VMS, then unix all the way. Been a bit lucky. Worked with the smallest UNIX systems and the largest (CRAY C90, SGI Origins in a supercomputer center). Always did like assember languages... the most advanced ever used was the VAX assembler, least advanced was my own (for an 8080 while still an undergrad). My last job (before retiring some years ago) was in security and maintenance of Kerberos for the DoD high performance modernization office.

Stack machines were always a bit of fun to write, but hardware capabilities exceeded the use of stack machines. Their fatal weakness is also their strength - the stack. It makes a truly horrible bottleneck for hardware implementation - they just can't be made fast, and that is why hardware is always register based.

As a targeted application tool though, it becomes very useful - the goal there is to push the actual time consuming operations into the virtual instruction set - which is why the JVM works even as well as it does - the class activation code is likely the slowest ever... but it is only one instruction in the JVM, and a slow one. The only reason the JVM works is being able to throw high speed CPUs at it. The Java card stuff is REALLY slow (and incomplete), and doesn't even really count as java anymore except in name.

schmitta · 11-09-2013, 01:37 PM

I want to transfer programs securely from a web site to an internet connected device. I want to encode/decode the program data with a c program. Do you know of any programs to do this? Thanks.

jpollard · 11-09-2013, 02:27 PM

Quote:

Originally Posted by schmitta

I want to transfer programs securely from a web site to an internet connected device. I want to encode/decode the program data with a c program. Do you know of any programs to do this? Thanks.

That is what OpenSSL provides.

It isn't necessarily simple to use, but there are examples (it comes with a sample client and a server, though they do not recommend using the openssl utility itself for that). And the Apache web server supports ssl. Now, the device does have to have a fairly significant amount of memory and processing capability, but if it has TCP/IP then it likely does.

Note - SSL provides an implementation of TLI - it only protects the data between the server and the receiver. It does not protect the data on either end. You can also look at PGP for encryption/decryption. It isn't just for email.

jpollard · 11-10-2013, 06:11 AM

Finally... got the 4 phases of the instruction tests done. Some could be better... but the initial set
is working now. Fixed a table error in the assembler (and a masking oversight), and a few more instructions fixed. I under counted the number of tests needed. Final count 233 (so far). Most instructions needed 2 to 4 tests for various boundary conditions (overflow/carry not tested even now).

Next bit is to add the few overlooked operations (branching on overflow/carry, unsigned branches) and being able to set/clear condition codes.