HamsterSpeak Python compiler

Make games! Discuss those games here.

Moderators: Bob the Hamster, marionline, SDHawk

Post Reply
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

HamsterSpeak Python compiler

Post by lennyhome »

Long story short, I've been playing Yume 2kki a lot lately, I must have taken the wrong turn at some point, and it eventually brought me back here.

Anyway.

I did some more refactoring and I got rid of all the dependency injections. The source code now conforms fully with to Python's import system, which I hope will improve readability for those who may be interested in compiler technology in general.

I also found a way to implement constant folding in an elegant way. It doesn't yet take into account local variables but hopefully I'll be able to do that next. The immediate consequence is that:

Code: Select all

define constant (10, base)
global variable (base + 1, tick)
global variable (base + 2, interacted)
...
is now acceptable.

Keep in mind that my version of the compiler is byte-code compatible with the engine but unlike the "physpeak" version which is in the GitHub, my version only supports a subset of the HamsterSpeak language which is fully functional but more similar to C.

Because of the renaming/refactoring, new download link is:
https://sourceforge.net/projects/snes9l ... z/download

Any future development by me will be merged there. Cheers to all.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

Nice to see you back!

Although (as you probably noticed) I haven't touched it in two years I still would like to eventually switch over to your new HS compiler. I've just got so many other features to work on. A major thing that still needs doing is porting over all the friendly error messages from HSpeak. I also wished that it could match HSpeak for speed. I can't remember whether PLY's heavy use of regexs was to blame, or just Python being slow.

I tried looking over your changes because I want to merge both your code cleanup and the constant folding, but because we've already diverged quite a bit it's going to be tricky. I'll need to compare to your previous version to figure out what's changed, but can definitely notice the import cleanup.

What do you mean by "take into account local variables"?

Coincidentally I was just about to add more math functions and operators in the coming days, such as *= and sine and max, and bit shifts too, remembering that you asked for them.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

I forgot about it for a while too but I had some free time recently and I wanted to put it back in a working condition the way I originally thought about it. Unfortunately the version that's on GitHub is too different from mine and all the issues about compatibility remain.

About constant folding, it goes into the AST depth-first and if it sees an operator with arguments that easily resolve to a constant, calls a similar Python operator and re-writes the node as a constant. It could also support operators that are not present in the target language. As long as the expression resolves, it acts as a pre-processor.

Variables can sometimes be determined to be constants too. It's tricky and it involves building an assignment history for every variable. Think of a variable that is assigned once unconditionally from a constant. If the same variable is assigned again inside a condition, then you may ask if the condition is a constant and so on.

Yesterday I've played "Bob vs the OHR" and... I think it's triple-A. Should be a best seller. Especially the Pepsi Man shop and moon with the yaranaika face. Also "Mystery Dungeon Friends 2" is very technically impressive. This is an amazing community.

EDIT: Python is slow. My original idea was to establish a formally correct set of parser rules for the language and then think about the implementation later. However, in reality the design phase for the parser rules took (is taking) longer than expected.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

Haven't gotten to going through your changes yet because I found some changes I'd made in a branch I wanted to finish off first.

Doing constant propagation (inferring fixed locals) would be pretty cool, a 'real' optimisation. hspeak does already do constant folding but that's too easy :)

I've probably said this before, but I had been hoping that Python+PLY implementation would beat HSpeak for speed because HSpeak's parsing and AST manipulation is very inefficient due to far too many passes and passing large trees around by value (Euphoria doesn't support pass by reference, and I assume it can't optimise away those copies but maybe it can). Its lexer is pretty efficient though.
But the speed is good enough:
Baconthulhu: hspeak: 0.4s physpeak: 1.4s
Entrepreneur: hspeak: 19s, physpeak: 44s
And it does beat hspeak when it's interpreted rather than compiled. Euphoria is still giving us grief, so Unix users sometimes have to settle for running it interpreted.

So do you were thinking about a hand-coded parser rather than using a parser generator?
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

Some people avoid languages that are not formally defined in principle. It's considered by some as a big point of demerit for a language. Not that I personally have anything to do with academia, but teachers will not touch anything that's not formally defined.

If you want to attempt something fancy for performance I just thought that pass 2 in the Python compiler be could ran multi-threaded at least in principle. The CPython GIL may make that useless but who knows.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

Writing out a formal grammar for a language is not a bad thing. FreeBASIC uses a hand coded lexer & parser and even though there are code comments that appear to be an EBNF grammar it is absolutely riddled with edge cases it handles poorly, as I discovered a decade ago when I started writing a FB parser. It's a very irregular language. hspeak was similar but on a much smaller scaler.

Having a lot of trouble with physpeak (in git) throwing errors on some "for(...) do(" lines (but not other, identical, lines) in most files I try to run through it. I have no idea what's wrong because the state passed to p_error() is None, which supposedly means a recoverable error (causes it to print "continue"), but it didn't parse the line. Do you know what these "recoverable errors" are? And why did you call AST_state.eof() there?

I hadn't thought of using threads. But I'm just being picky, complaining about speed, wanting the new compiler to be better than the old one in every way :)
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

The unary operator for cases such as "-f(a)" doesn't work in the version that's on GitHub. It's my fault because at the time I guessed KIND_MATH 19 was arithmetic negate, but it's a logical not.

A "recoverable error" happens when a statement is left incomplete. It's explained in the PLY manual. At the moment an error due to an unbalanced parenthesis is reported as 'syntax error at ","' or "continue". Which isn't helpful.

It happens because the parser can't differentiate between an explicit comma and an auto-inserted one.

About conditionals I wrote this in the readme of my version:
It is always safe to break a line after a literal '(' because empty
statements at the beginning of a list are ignored.

if/elseif/else blocks are single statements and must be written like
this:

if (condition) then (
...
) else if (condition) then (
...
) else (
...
)

or like this:

if (condition) then ( ... ) \
else ( ... )

or something in between.
Notice the backslash in the second form. If you don't add it, you get a cryptic error message.

Everything is fixable but I don't have the energy of the mental capacity to pile up special case after special case in the code. If I don't fix something is because I have no choice but to wait until I become a better Python programmer. Sometimes due to my limitations I have to stick to what works for me.

EDIT:
FreeBASIC uses a hand coded lexer & parser and even though there are code comments that appear to be an EBNF.
It's not uncommon for early languages to have to cheat somewhat. Even plain C has a case where it needs to consult a dynamic table in addition to the next token, breaking the "context free" rules.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

Thanks.
Turns out that there were subscripts above the offending lines (which still aren't supported), and the 'end' caused parsing of the problematic scripts to end, so the "for" line was parsed as if at global scope, in isolation. So I became quite confused. Surprised no messages are printed for junk at global scope.

I think the toplevel (tld) parsing likely needs to be rewritten. It's so fragile and has various incompatibilities, and it's far removed from having a clean grammar. But finding the end of a script without the kludge of using 'end' only to end scripts is a pain. (It had never occurred to me before that single-pass-parsers and languages are far more elegant.) Maybe what's needed is to parse the whole file at once with a single call to ply.yacc rather than trying to parse the top level separate and buffer lines into scripts and sections (except for special handling of "include" lines, and stripping of comments), and defer most of what tld.py does to post-processing on the AST instead. That's a big change though and I worry about it having unintended consequences -- poor error resumption, bad performance?

Currently working on merging your improvements.

Oh yeah, I can add a negate math function even if hspeak doesn't support it.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

Yes the toplevel (tld) parsing needs to be rewritten. The "no comments allowed" in the "end" happens because TLD doesn't support comments. The function declaration is read by PLY but "end" is read by TLD, so...

In POST, "operator.floordiv" of Python is not the same as the "/" in HamsterSpeak. "operator.floordiv" rounds towards minus infinity, "/" rounds toward 0.

The "negate" function has to be part of the "compiler runtime".
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

OK, I've finally finished the merge (and pushed to svn). I also merged various of your changes I'd missed before, in particular I just copied your file.py over mine (though renamed to hsfile.py). So it's easier to compare the two than before.
Instead of adding build.py I thought it was cleaner to add AST_build as a wrapper in parser.py which passes lexer and yacc to AST_state.build(). Particularly because my AST_state.build() contained a lot more than yours, and it belongs in that class.

Yes, operator.mod also doesn't match HS's modulus operator. I replaced operator.floordiv and operator.mod with correct functions, and also applied 32-bit integer overflow of all results to be strictly correct. There are a lot of other operators not folded over in post (hspeak does so).
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

Cool. I'll give it a read.

I moved AST_build to its own module because of order of inclusion. That's one of those aspects where I feel an expert Python programmer could suggest a much better overall structure.

Another issue I left behind is that a list of statements and a list of of function arguments shouldn't be considered the same thing. For better error reporting, statements that return nothing (such as "if" and "while") should not be allowed in a list of function arguments.

Maybe the auto-inserted separator should be changed to ";" so it would become possible to tell apart an error due to a misplaced "," from an wrongly terminated line. In case of a wrongly terminated line, some languages such as Javascript attempt to repair the line, but that is known to cause many unintended consequences.

After that I can't think of anything else
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

In my version, statements like 'if' are already disallowed in expression lists. I'd also forgotten that I'd extended PLY to produce better error messages:

Code: Select all

Line 1    x(if(0)then())
            ^
Syntax error at 'if': Expected to see expression list
However I do notice that it shows the wrong error message if you put a statement in a condition:

Code: Select all

Line 1    if(if(0)then())
                        ^
Syntax error at ')': Expected to see: end-of-input
I also already mostly solved the problems with commas and line breaking a couple years ago. I just tried line breaking in all sorts of strange ways and moving commas around, and in every case I tried it behaved properly. It is slightly incompatible with hspeak, for example you can't write:

Code: Select all

x(1
  2)
it has to be

Code: Select all

x(1,
  2)
or
x(1
 ,2)
and "if(1), then(2)" is also not allowed. I likely should allow a comma before 'then' and 'else', but other than that I find the stricter handling of commas than hspeak desirable.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

I've given a brief read to physpeak. Look good to me. Almost as if somebody who actually known what he's doing worked on it.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Re: HamsterSpeak Python compiler

Post by TMC »

To begin with that person was you, since I'd never bothered to learn LR parsers! I'm really pleased to have this.
I was thinking of just using 'physpeak' as a codename, and eventually rename it to HSpeak v4 (v1 -> v2 was also a rewrite in a different language).

I'll take a break from it now unless you want to throw over more code.

Actually, rather than working on feature parity and compatibility with hspeak I was thinking of jumping to adding future features like floating point, arrays, strings, objects etc. so that I have something to work with when adding the new script interpreter, to speed that up.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Re: HamsterSpeak Python compiler

Post by lennyhome »

Don't feel obligated towards me in any way. Really.

A couple of changes I made recently:

Code: Select all

def t_NUMBER(t):
    r'[0-9.][0-9. ]*'
    t.value = t.value.replace(' ', '')
    if "." in t.value:
        t.value = int(float(t.value) * 256)
    else:
        t.value = int(t.value)
    return t
Allows to type fixed point constants. The presence of a "." in a number indicates that it's a fixed point constant. So "1" equals 1 and "1." equals 256.

Code: Select all

def p_unop(p):
    "expression : '-' expression %prec UMINUS"
    p[0] = AST_node("function", [p[2], AST_node("number", None, -1)], "*")
With the basic constant folding in place it's no longer necessary to have "negate" as an external function. This way it's inlined, saves a function call and it's reduced later in POST like any other multiplication.
Post Reply