Indent tool for HSpeak

Talk about things that are not making games here. But you should also make games!

Moderators: Bob the Hamster, marionline, SDHawk

TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Neat. It's not far off being able to serialise the AST to a .hsz files. I'm going to working on adding the ability for ohrrpgce-game to call out to a external REPL process.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I wish a reliable parser generator for Freebasic existed. I've found this in the forums, but it's work in progress.

Maybe you could set up a regular flex/bison project in C and then use it as an external library. It would save you from having to call an external process and it wouldn't add any new dependence to the engine. You could probably recycle the parsing rules of the Python/PLY prototype.

Or you could just embed Lua for which Freebasic already has bindings and an example, but that would be too obvious. I mean, every other engine does that.
Last edited by lennyhome on Sat Feb 29, 2020 11:36 am, edited 2 times in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

But I actually prefer to run the REPL in a separate process in a terminal emulator so that we get a separate window with text editing, scrollback, readable fonts, etc. all for free.

There's no technical problem using a parser generator that produces C and linking it in; C and FB code modules can be mixed relatively freely (FB strings are annoying to pass around, but C strings are no problem). FB users just try to do everything in FB because they don't know or dislike C. There's a very good C-to-FB header translator, fbfrog. Distributing Python programs is a pain, but MicroPython should be able to run Python code produced by PLY and is very small. James and I are both Python fans, we certainly would rewrite HSpeak in Python and wouldn't even consider using C, C++ or FB. I do worry about performance though; MicroPython is slower than CPython which is much slower than Euphoria. You're right that we avoid adding new dependencies, but I don't care too much about dependencies for editing games.

Speaking of lua, the plan is to embed a VM for a mature scripting language anyway, and translate HS to that language, so that it can be extended easily into a modern language. For a long time I was planning to embed lua (I did start on HS-to-lua translation), but there are so many things I dislike about lua that I'm probably going to use Squirrel instead. I considered MicroPython/pycopy, but it's just not designed for embedding in other programs as a scripting VM. (Suggestions welcome.)
Last edited by TMC on Sat Feb 29, 2020 1:16 pm, edited 6 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

there are so many things I dislike about lua
The 1-based array indices.

To put things in perspective, if I run Crypt of Baconthulu (which is my favorite game) on my Raspberry Pi 3 with "-z 3 -s", it uses 30% of one CPU core. A simple pure interpreter written in flex/bison could run on a separate thread, sychronize with the engine via just one mutex/cond and you could make it close to HSpeak.

I presented the Python/PLY compiler idea instead of the above because I'm very afraid of committing to anything that requires touching the engine, but you can certainly do it.

Anyway, it's nice to know that things are moving on the scripting front. I'll make some improvements to the AST generator as soon as I can get back into "that state of mind" and maybe it will be useful for something eventually.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

It would be nice to have the option of a REPL as a drop-down console within Game's window. Usability would inevitably be worse though. It could still spawn a separate process behind the scenes.

Another consideration is that I plan to replace the script debugger, which is horrible in every way. I want to move the debugger to a separate process/program for the same reasons. It needs to be able to display a lot of text readably (eg source code) with a good UI. The IPC to query and modify interpreter state would be tricky. Of course it could include a REPL too. I'm actually seriously considering writing it in JS and running in the user's webbrowser, embedding a trivial HTTP server in Game. That's one way to avoid dependencies! Alternatives are to extend HamsterWhisper or, most realistically, to use/adapt an existing debugger for the underlying VM (eg Zerobrane for Lua... which is one of the bigadvantages of using Lua. Squirrel only has Eclipse and VS-based debuggers, yeech)

1-based array indices aren't my favourite but that detail doesn't matter at all when you're translating to Lua since it only affects table literal syntax, you can use 0-based arrays by typing 5 extra characters. But it would be an uphill battle to prevent all of lua's many weird semantics creeping into HS. And it's impractical to disallow "1" + 2 = 3 without either patching the Lua VM or using just the upcoming Lua 5.4 which removes that misfeature.

Lots of decisions to make. Good to rapidly prototype this stuff before committing to anything.
Last edited by TMC on Sat Feb 29, 2020 11:49 pm, edited 5 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I've implemented some form of automatic line continuation. So now you can do this:

Code: Select all

HSpeak> a(
 .... > 10, 40
 .... > , 35 + 2 
 .... > )
return
    function: a
        number: 10
        number: 40
        binop: +
            number: 35
            number: 2
If an error is detected at EOF it goes into line continuation mode until the error disappears or until it isn't located at the end of the buffer. I'm not sure it's the right strategy, but it's interesting.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

At first I thought that wouldn't be right, but actually I can't think of anything wrong with that. It's much better than just counting brackets since it continues on from "x := 1 +"

I've started on extending hspeak.ast.py (with hspeak-defines.py) to produce lone .hsz's. Not handling any flow control. Just want to do enough to call a single builtin function, and then pass that to REPL. I have no idea yet how to handle local variables in the REPL.
I also did some work on launching an external REPL in a terminal emulator.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

There is an example in PLY that does line continuation by counting parenthesis. The advantage is that you don't have to re-parse the whole line every time a continuation happens, so it would be faster.
I have no idea yet how to handle local variables in the REPL.
I don't know if this is has anything to do with what you're thinking but I just wanted to add some thought to the byval/byref issue that I encountered earlier.

Consider this Python example:

Code: Select all

>>> a = [0, 1, 2, 3]
>>> del(a[1])
>>> a
[0, 2, 3]
Notice that altough "del" looks like a regular function call, it's really an operator because it has to receive a pointer to the array, not to the value of the array element. That's the reason I had to single out specific functions and specific arguments in my interpreter attempt.

Consider this statement in HSpeak:

Code: Select all

global variable(10 + a++, some global variable)
You want the first argument to be reduced to a number, but you want a literal of the second argument where no attempt is made to de-reference it, because you need it as the key in the "global scope" dictionary. Same goes for:

Code: Select all

variable(a, b, c, d)
This time the literal argument becomes the key in the "local scope" dictionary. That's about it.

I got discoraged and moved on to AST generation when I realized that an interpreter and a compiler would be two separate projects with possibly only the AST in common.

Anyway, that's my story and sticking to it.

----

A real life if/then:

Code: Select all

if(death count >= 1) then(
 .... > set tag(tag:save scummer, ON)
 .... > set tag(tag:save game exists, OFF)
 .... > )
if
    expression group
        binop: >=
            name: deathcount
            number: 1
    expression group
        function: settag
            name: tag:savescummer
            name: ON
        function: settag
            name: tag:savegameexists
            name: OFF
Also nested parenthesis, a ',' at the end of a list and the empty list are accepted:

Code: Select all

HSpeak> (10, (4, 1,), 5, (), 4 + 2)
return
    expression group
        number: 10
        expression group
            number: 4
            number: 1
        number: 5
        expression group
        binop: +
            number: 4
            number: 2
Last edited by lennyhome on Wed Mar 04, 2020 7:11 am, edited 4 times in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Those AST improvement look nice.


Success! I was able to produce a hsz file containing a simple script (as printed by decompile.py from nohrio):

Code: Select all

Script dummy()
do(
  return(showvalue(1000 * 3 + 9))
)
and I imported it into a game and it worked.

I just tacked my code onto hspeak_ast.py. Here's the patch
http://tmc.castleparadox.com/ohr/hsz_compiler.patch

And here's a complete copy of the source -- only hspeak_ast.py is changed, but I included my hspeak_defines.py too, and also a test_compile.sh script which packages the out.hsz file produced by hspeak_ast.py into an .hs file and also runs it through decompile.py to verify it.
http://tmc.castleparadox.com/ohr/hsz_compiler.tar.gz

Yes, an interpreter and compiler don't share much aside from the AST, but building the AST is almost a whole HS compiler. And the overlap is even higher if it's an "optimising" compiler which evaluates constant expressions (which HSpeak does do, but nothing more).

The problem with local variables in a REPL is that I was planning on packaging up each line of input into an .hsz file and running it through the interpreter. Of course that means you can't save local variables. Either I need to drop that idea, or... I could use nonlocal variables instead and store the variabels in the parent scope between executions. But even that would still need some changes to the interpreter, so I might as well go with a less convoluted solution.

Turning variable names into variable references (ID numbers) has to happen relatively late if "variable()" us allowed to appear after a variable is used, so it becomes necessary to keep unresolved tokens in the AST whenever variables are used.
Last edited by TMC on Wed Mar 04, 2020 2:10 pm, edited 4 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I just tacked my code onto hspeak_ast.py
At a glance it's almost exactly what I would have done myself in 6 months or so probably. Very nice. I'll be examining it more closely.

You can have a REPL-local scope of variables. You just need to be able to send in a compiled mini-script for immediate execution and to be able to read a return value from the mini-script. I don't know if the engine currently allows that, but that would be the only change needed to fake an interpreter.
Last edited by lennyhome on Wed Mar 04, 2020 5:09 pm, edited 1 time in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

I need to implement immediate execution of scripts anyway, so I'll be doing that.

I forgot to say: I won't be doing any more work on that hsz compiler for a while, at least a week, so I hope you'll take the code and integrate it into what you've got, and free to make changes/extend it. I'm too busy, and when I have more time I'll work on the engine side of implementing a REPL instead.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I've picked up your code and roughly added it to my package. I'm kinda busy too and besides, we need to take much more time to do things or people may think it was easy.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Sensible reasoning. And I'm meant to be working on a game for a contest right now, but couldn't help myself.
Last edited by TMC on Thu Mar 05, 2020 9:26 am, edited 1 time in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I've updated my package here (same link as before) and it would be a good time to get it fresh because I did some major refactoring and I've integrated your code.

I've added "baconthulhu.txt" which is the result of an attempt to batch compile every script in "baconthulhu.hss". It should give you a good idea of where we're standing. The performance of interpreted Python is adequate at around 1k lines per second.

Your code generator is in "hspeak_gen.py". You can run/test it via "hspeak_ast.py" like:

Code: Select all

HSpeak> noop(), 2 + 3     
root
  function: noop
  binop: +
    number: 2
    number: 3
toHSZ: 110 bytes
The result of the code generation isn't written anywhere, it just calls it and prints errors.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Wow, a lot of progress here.
Could you slap a license on it? (BTW, we're going to attempt to relicense the OHRRPGCE to something more permissive than the GPL)

Is the sigWARNING: 2 shift/reduce conflicts

"if(...)\nthen(...)" gets converted to "if(...),then(...)" which produces a syntax error. Of course this could be fixed by adding a check for ')' here:

Code: Select all

                s1 = s1.rstrip()
                if s1[-1] != ',' and s1[-1] != '(':
                    s1 += ','
However it's quite common to see people put a comma between a ")" and the next token even before "then", "else" and "do, so I think I it's be better to keep that. Not just in the case of two statements (which your parser allows): "show textbox(1), wait for textbox".
I think it's reasonable to disallow "then,()" (which HSpeak currently allows) but allow "then\n()", but it might be a pain to distinguish them. Doesn't really matter.

HamsterSpeak used to be practically comma-insensitive (you could add extra commas between any tokens) until a couple of years ago (version 3Se). Now a lot of extra commas are disallowed, though I want to tighten it even more (look at testgame/parser_tests.hss if you're interested. But please don't take it as authoritative). I'm surprised you have the following rule (HSpeak doesn't allow it):

Code: Select all

    expression_group : '(' ',' ')'
It does allow x(1,) though, which you also have; I wasn't sure whether to disallow that one. Better not to change the laxness without a good reason.

"if(...)else(...)" is also valid in HS. And so is a "do" block not following a for/while/switch. They are useful because they can contain "break" and "continue" statements. BTW, "break" and "continue" take an optional argument: the number of scopes to break out of. Amusingly, it can be an expression.

(And, I see I was totally wrong to talk about associating 'if' and 'then' block using AST manipulations rather than simply defining production rules)

1k lines per second? That would be pretty slow, but it's much faster than that for me (maybe you were talking about a previous version?): 1.1s for baconthulhu.hss (vs 0.42s for HSpeak). Excluding plotscr.hsd and scancode.hsi (note: both plotscr.hsd and baconthulhu.hss include scancode.hsi), it's 0.60s for 4237 lines, so 7kLoC/s. That's fine. I see about half of the time is spent in yacc, about a quarter in lex. I guess there's not much room to speed it up while sticking to PLY, maybe the only real way to do so is to handle whitespace in the lexer somehow rather than in the parser, to get rid of the name_concat production rules and reduce the amount the parser needs to do. Similarly, HSpeak actually removes all commas and newlines in the lexer, which is cheating but means there's far less to parse.

Another example: On a copy of Entrepreneur's scritps (307kLoC including plotscr.hsd - the size of the scripts have since been reduced), HSpeak takes 22.5s and hspeak_tld.py takes 42.7s. (It took HSpeak just 2.6s to lex)
Last edited by TMC on Thu Mar 12, 2020 12:11 pm, edited 8 times in total.
Post Reply