Latest LQ Deal: Linux Power User Bundle
Go Back > Blogs > rainbowsally
User Name


Rate this Entry

news, sorta

Posted 01-10-2013 at 04:48 PM by rainbowsally

The new parser that will be in mc2 3.1 (currently 3.0) is really very nice (based on but changed a bit from the test files recently posted.)

You'll never find a parser that's super-easy to use, but this one is in C (no intermediate file), makes sense in C, and is debuggable in C so debugging both the code and the algorithms is much much easier than something like flex, lemon, or the gazillion other alternatives.

As kind of a challenge, we started the first few passes for a C/C++ decompiler, found a few snags in the parser and fixed them. The parser does multi-pass very easily and whipped out commented disassembly of autogen (standard linux, utility) in a little over two seconds.

Altogether, it did seven passes and the input file (objdump -d) was well over a meg.

The reason for using the objdump output is because relipmoC used disassembly and got a lot of the C stuff right.

But they only disassembled files they compiled.

And they did screwy stuff like reading function declarations and function sizes. And if these weren't in the files in what it 'thought' was the right places, it would choke and spit out a syntax error message. Dumb.

The only way you'd ever have that in the disassembly is if you compiled the application you are trying to decompile in the first place.

Still, as a proof of concept it was quite convincing.

But their parsing 'rules' were far stricter than they should have been and they couldn't be corrected because they didn't include the right flex file in the download.

Flex. Burn it. Nuke it. You have been warned.

And so I tried to get a working flex file for the app and got RE-DISGUSTED with flex when it wrote all over my other source files, like "main.cpp"! GRR! And it's done that before.

So we're flat-out DONE with flex FOREVER. No more. Never again will it randomly destroy a once working source file.

But the concept of relipmoC was really FAN-TASTIC! The docs were enlightening (dealing with nodes and analytical methods), and the approach unique (to me, at least) and sensible although the disassembly should have been from objdump, not from gcc!

Nobody's done even a decent C decompiler since the 80s. Christina Fuentes, turbo C decompiler for DOS executables, I think.

It's quite a challenge.

And since it's borderline impossible, it seemed that a C/C++ decompiler project might make for a great test of a parsing tool, no?



General principles re parsing.

BTW, the reason for the several passes mentioned above is so that problems that show up can then be isolated to within a certain range of criteria/algorithms which we call "rules" in the jargon.

Multi-pass is not the fastest way to do it, but the overhead really isn't too bad. The memcmp() stuff is optimized now (only calls memcmp() if the first letter matches which sped this decompile test up about 30%.

We might include what is done so far in the test files for the next-next generation of mc2. :-)

Version 3.1 coming soon!

The Computer Mad Science Team

Posted in Uncategorized
Views 507 Comments 0
« Prev     Main     Next »
Total Comments 0




All times are GMT -5. The time now is 02:36 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration