Page 1 of 1

HamsterSpeak Python compiler

Posted: Tue May 10, 2022 11:27 pm
by lennyhome
Long story short, I've been playing Yume 2kki a lot lately, I must have taken the wrong turn at some point, and it eventually brought me back here.

Anyway.

I did some more refactoring and I got rid of all the dependency injections. The source code now conforms fully with to Python's import system, which I hope will improve readability for those who may be interested in compiler technology in general.

I also found a way to implement constant folding in an elegant way. It doesn't yet take into account local variables but hopefully I'll be able to do that next. The immediate consequence is that:

Code: Select all

define constant (10, base)
global variable (base + 1, tick)
global variable (base + 2, interacted)
...
is now acceptable.

Keep in mind that my version of the compiler is byte-code compatible with the engine but unlike the "physpeak" version which is in the GitHub, my version only supports a subset of the HamsterSpeak language which is fully functional but more similar to C.

Because of the renaming/refactoring, new download link is:
https://sourceforge.net/projects/snes9l ... z/download

Any future development by me will be merged there. Cheers to all.

Re: HamsterSpeak Python compiler

Posted: Thu May 12, 2022 3:58 pm
by TMC
Nice to see you back!

Although (as you probably noticed) I haven't touched it in two years I still would like to eventually switch over to your new HS compiler. I've just got so many other features to work on. A major thing that still needs doing is porting over all the friendly error messages from HSpeak. I also wished that it could match HSpeak for speed. I can't remember whether PLY's heavy use of regexs was to blame, or just Python being slow.

I tried looking over your changes because I want to merge both your code cleanup and the constant folding, but because we've already diverged quite a bit it's going to be tricky. I'll need to compare to your previous version to figure out what's changed, but can definitely notice the import cleanup.

What do you mean by "take into account local variables"?

Coincidentally I was just about to add more math functions and operators in the coming days, such as *= and sine and max, and bit shifts too, remembering that you asked for them.

Re: HamsterSpeak Python compiler

Posted: Thu May 12, 2022 10:08 pm
by lennyhome
I forgot about it for a while too but I had some free time recently and I wanted to put it back in a working condition the way I originally thought about it. Unfortunately the version that's on GitHub is too different from mine and all the issues about compatibility remain.

About constant folding, it goes into the AST depth-first and if it sees an operator with arguments that easily resolve to a constant, calls a similar Python operator and re-writes the node as a constant. It could also support operators that are not present in the target language. As long as the expression resolves, it acts as a pre-processor.

Variables can sometimes be determined to be constants too. It's tricky and it involves building an assignment history for every variable. Think of a variable that is assigned once unconditionally from a constant. If the same variable is assigned again inside a condition, then you may ask if the condition is a constant and so on.

Yesterday I've played "Bob vs the OHR" and... I think it's triple-A. Should be a best seller. Especially the Pepsi Man shop and moon with the yaranaika face. Also "Mystery Dungeon Friends 2" is very technically impressive. This is an amazing community.

EDIT: Python is slow. My original idea was to establish a formally correct set of parser rules for the language and then think about the implementation later. However, in reality the design phase for the parser rules took (is taking) longer than expected.

Re: HamsterSpeak Python compiler

Posted: Fri May 13, 2022 4:17 pm
by TMC
Haven't gotten to going through your changes yet because I found some changes I'd made in a branch I wanted to finish off first.

Doing constant propagation (inferring fixed locals) would be pretty cool, a 'real' optimisation. hspeak does already do constant folding but that's too easy :)

I've probably said this before, but I had been hoping that Python+PLY implementation would beat HSpeak for speed because HSpeak's parsing and AST manipulation is very inefficient due to far too many passes and passing large trees around by value (Euphoria doesn't support pass by reference, and I assume it can't optimise away those copies but maybe it can). Its lexer is pretty efficient though.
But the speed is good enough:
Baconthulhu: hspeak: 0.4s physpeak: 1.4s
Entrepreneur: hspeak: 19s, physpeak: 44s
And it does beat hspeak when it's interpreted rather than compiled. Euphoria is still giving us grief, so Unix users sometimes have to settle for running it interpreted.

So do you were thinking about a hand-coded parser rather than using a parser generator?

Re: HamsterSpeak Python compiler

Posted: Fri May 13, 2022 11:21 pm
by lennyhome
Some people avoid languages that are not formally defined in principle. It's considered by some as a big point of demerit for a language. Not that I personally have anything to do with academia, but teachers will not touch anything that's not formally defined.

If you want to attempt something fancy for performance I just thought that pass 2 in the Python compiler be could ran multi-threaded at least in principle. The CPython GIL may make that useless but who knows.

Re: HamsterSpeak Python compiler

Posted: Sat May 14, 2022 3:00 am
by TMC
Writing out a formal grammar for a language is not a bad thing. FreeBASIC uses a hand coded lexer & parser and even though there are code comments that appear to be an EBNF grammar it is absolutely riddled with edge cases it handles poorly, as I discovered a decade ago when I started writing a FB parser. It's a very irregular language. hspeak was similar but on a much smaller scaler.

Having a lot of trouble with physpeak (in git) throwing errors on some "for(...) do(" lines (but not other, identical, lines) in most files I try to run through it. I have no idea what's wrong because the state passed to p_error() is None, which supposedly means a recoverable error (causes it to print "continue"), but it didn't parse the line. Do you know what these "recoverable errors" are? And why did you call AST_state.eof() there?

I hadn't thought of using threads. But I'm just being picky, complaining about speed, wanting the new compiler to be better than the old one in every way :)

Re: HamsterSpeak Python compiler

Posted: Sat May 14, 2022 8:40 am
by lennyhome
The unary operator for cases such as "-f(a)" doesn't work in the version that's on GitHub. It's my fault because at the time I guessed KIND_MATH 19 was arithmetic negate, but it's a logical not.

A "recoverable error" happens when a statement is left incomplete. It's explained in the PLY manual. At the moment an error due to an unbalanced parenthesis is reported as 'syntax error at ","' or "continue". Which isn't helpful.

It happens because the parser can't differentiate between an explicit comma and an auto-inserted one.

About conditionals I wrote this in the readme of my version:
It is always safe to break a line after a literal '(' because empty
statements at the beginning of a list are ignored.

if/elseif/else blocks are single statements and must be written like
this:

if (condition) then (
...
) else if (condition) then (
...
) else (
...
)

or like this:

if (condition) then ( ... ) \
else ( ... )

or something in between.
Notice the backslash in the second form. If you don't add it, you get a cryptic error message.

Everything is fixable but I don't have the energy of the mental capacity to pile up special case after special case in the code. If I don't fix something is because I have no choice but to wait until I become a better Python programmer. Sometimes due to my limitations I have to stick to what works for me.

EDIT:
FreeBASIC uses a hand coded lexer & parser and even though there are code comments that appear to be an EBNF.
It's not uncommon for early languages to have to cheat somewhat. Even plain C has a case where it needs to consult a dynamic table in addition to the next token, breaking the "context free" rules.

Re: HamsterSpeak Python compiler

Posted: Sun May 15, 2022 3:05 pm
by TMC
Thanks.
Turns out that there were subscripts above the offending lines (which still aren't supported), and the 'end' caused parsing of the problematic scripts to end, so the "for" line was parsed as if at global scope, in isolation. So I became quite confused. Surprised no messages are printed for junk at global scope.

I think the toplevel (tld) parsing likely needs to be rewritten. It's so fragile and has various incompatibilities, and it's far removed from having a clean grammar. But finding the end of a script without the kludge of using 'end' only to end scripts is a pain. (It had never occurred to me before that single-pass-parsers and languages are far more elegant.) Maybe what's needed is to parse the whole file at once with a single call to ply.yacc rather than trying to parse the top level separate and buffer lines into scripts and sections (except for special handling of "include" lines, and stripping of comments), and defer most of what tld.py does to post-processing on the AST instead. That's a big change though and I worry about it having unintended consequences -- poor error resumption, bad performance?

Currently working on merging your improvements.

Oh yeah, I can add a negate math function even if hspeak doesn't support it.

Re: HamsterSpeak Python compiler

Posted: Sun May 15, 2022 9:18 pm
by lennyhome
Yes the toplevel (tld) parsing needs to be rewritten. The "no comments allowed" in the "end" happens because TLD doesn't support comments. The function declaration is read by PLY but "end" is read by TLD, so...

In POST, "operator.floordiv" of Python is not the same as the "/" in HamsterSpeak. "operator.floordiv" rounds towards minus infinity, "/" rounds toward 0.

The "negate" function has to be part of the "compiler runtime".

Re: HamsterSpeak Python compiler

Posted: Tue May 17, 2022 2:56 pm
by TMC
OK, I've finally finished the merge (and pushed to svn). I also merged various of your changes I'd missed before, in particular I just copied your file.py over mine (though renamed to hsfile.py). So it's easier to compare the two than before.
Instead of adding build.py I thought it was cleaner to add AST_build as a wrapper in parser.py which passes lexer and yacc to AST_state.build(). Particularly because my AST_state.build() contained a lot more than yours, and it belongs in that class.

Yes, operator.mod also doesn't match HS's modulus operator. I replaced operator.floordiv and operator.mod with correct functions, and also applied 32-bit integer overflow of all results to be strictly correct. There are a lot of other operators not folded over in post (hspeak does so).

Re: HamsterSpeak Python compiler

Posted: Tue May 17, 2022 8:35 pm
by lennyhome
Cool. I'll give it a read.

I moved AST_build to its own module because of order of inclusion. That's one of those aspects where I feel an expert Python programmer could suggest a much better overall structure.

Another issue I left behind is that a list of statements and a list of of function arguments shouldn't be considered the same thing. For better error reporting, statements that return nothing (such as "if" and "while") should not be allowed in a list of function arguments.

Maybe the auto-inserted separator should be changed to ";" so it would become possible to tell apart an error due to a misplaced "," from an wrongly terminated line. In case of a wrongly terminated line, some languages such as Javascript attempt to repair the line, but that is known to cause many unintended consequences.

After that I can't think of anything else

Re: HamsterSpeak Python compiler

Posted: Wed May 18, 2022 1:35 am
by TMC
In my version, statements like 'if' are already disallowed in expression lists. I'd also forgotten that I'd extended PLY to produce better error messages:

Code: Select all

Line 1    x(if(0)then())
            ^
Syntax error at 'if': Expected to see expression list
However I do notice that it shows the wrong error message if you put a statement in a condition:

Code: Select all

Line 1    if(if(0)then())
                        ^
Syntax error at ')': Expected to see: end-of-input
I also already mostly solved the problems with commas and line breaking a couple years ago. I just tried line breaking in all sorts of strange ways and moving commas around, and in every case I tried it behaved properly. It is slightly incompatible with hspeak, for example you can't write:

Code: Select all

x(1
  2)
it has to be

Code: Select all

x(1,
  2)
or
x(1
 ,2)
and "if(1), then(2)" is also not allowed. I likely should allow a comma before 'then' and 'else', but other than that I find the stricter handling of commas than hspeak desirable.

Re: HamsterSpeak Python compiler

Posted: Wed May 18, 2022 5:00 am
by lennyhome
I've given a brief read to physpeak. Look good to me. Almost as if somebody who actually known what he's doing worked on it.

Re: HamsterSpeak Python compiler

Posted: Wed May 18, 2022 1:59 pm
by TMC
To begin with that person was you, since I'd never bothered to learn LR parsers! I'm really pleased to have this.
I was thinking of just using 'physpeak' as a codename, and eventually rename it to HSpeak v4 (v1 -> v2 was also a rewrite in a different language).

I'll take a break from it now unless you want to throw over more code.

Actually, rather than working on feature parity and compatibility with hspeak I was thinking of jumping to adding future features like floating point, arrays, strings, objects etc. so that I have something to work with when adding the new script interpreter, to speed that up.

Re: HamsterSpeak Python compiler

Posted: Wed May 18, 2022 11:16 pm
by lennyhome
Don't feel obligated towards me in any way. Really.

A couple of changes I made recently:

Code: Select all

def t_NUMBER(t):
    r'[0-9.][0-9. ]*'
    t.value = t.value.replace(' ', '')
    if "." in t.value:
        t.value = int(float(t.value) * 256)
    else:
        t.value = int(t.value)
    return t
Allows to type fixed point constants. The presence of a "." in a number indicates that it's a fixed point constant. So "1" equals 1 and "1." equals 256.

Code: Select all

def p_unop(p):
    "expression : '-' expression %prec UMINUS"
    p[0] = AST_node("function", [p[2], AST_node("number", None, -1)], "*")
With the basic constant folding in place it's no longer necessary to have "negate" as an external function. This way it's inlined, saves a function call and it's reduced later in POST like any other multiplication.