Indent tool for HSpeak
Moderators: Bob the Hamster, marionline, SDHawk
I wish a reliable parser generator for Freebasic existed. I've found this in the forums, but it's work in progress.
Maybe you could set up a regular flex/bison project in C and then use it as an external library. It would save you from having to call an external process and it wouldn't add any new dependence to the engine. You could probably recycle the parsing rules of the Python/PLY prototype.
Or you could just embed Lua for which Freebasic already has bindings and an example, but that would be too obvious. I mean, every other engine does that.
Maybe you could set up a regular flex/bison project in C and then use it as an external library. It would save you from having to call an external process and it wouldn't add any new dependence to the engine. You could probably recycle the parsing rules of the Python/PLY prototype.
Or you could just embed Lua for which Freebasic already has bindings and an example, but that would be too obvious. I mean, every other engine does that.
Last edited by lennyhome on Sat Feb 29, 2020 11:36 am, edited 2 times in total.
But I actually prefer to run the REPL in a separate process in a terminal emulator so that we get a separate window with text editing, scrollback, readable fonts, etc. all for free.
There's no technical problem using a parser generator that produces C and linking it in; C and FB code modules can be mixed relatively freely (FB strings are annoying to pass around, but C strings are no problem). FB users just try to do everything in FB because they don't know or dislike C. There's a very good C-to-FB header translator, fbfrog. Distributing Python programs is a pain, but MicroPython should be able to run Python code produced by PLY and is very small. James and I are both Python fans, we certainly would rewrite HSpeak in Python and wouldn't even consider using C, C++ or FB. I do worry about performance though; MicroPython is slower than CPython which is much slower than Euphoria. You're right that we avoid adding new dependencies, but I don't care too much about dependencies for editing games.
Speaking of lua, the plan is to embed a VM for a mature scripting language anyway, and translate HS to that language, so that it can be extended easily into a modern language. For a long time I was planning to embed lua (I did start on HS-to-lua translation), but there are so many things I dislike about lua that I'm probably going to use Squirrel instead. I considered MicroPython/pycopy, but it's just not designed for embedding in other programs as a scripting VM. (Suggestions welcome.)
There's no technical problem using a parser generator that produces C and linking it in; C and FB code modules can be mixed relatively freely (FB strings are annoying to pass around, but C strings are no problem). FB users just try to do everything in FB because they don't know or dislike C. There's a very good C-to-FB header translator, fbfrog. Distributing Python programs is a pain, but MicroPython should be able to run Python code produced by PLY and is very small. James and I are both Python fans, we certainly would rewrite HSpeak in Python and wouldn't even consider using C, C++ or FB. I do worry about performance though; MicroPython is slower than CPython which is much slower than Euphoria. You're right that we avoid adding new dependencies, but I don't care too much about dependencies for editing games.
Speaking of lua, the plan is to embed a VM for a mature scripting language anyway, and translate HS to that language, so that it can be extended easily into a modern language. For a long time I was planning to embed lua (I did start on HS-to-lua translation), but there are so many things I dislike about lua that I'm probably going to use Squirrel instead. I considered MicroPython/pycopy, but it's just not designed for embedding in other programs as a scripting VM. (Suggestions welcome.)
Last edited by TMC on Sat Feb 29, 2020 1:16 pm, edited 6 times in total.
The 1-based array indices.there are so many things I dislike about lua
To put things in perspective, if I run Crypt of Baconthulu (which is my favorite game) on my Raspberry Pi 3 with "-z 3 -s", it uses 30% of one CPU core. A simple pure interpreter written in flex/bison could run on a separate thread, sychronize with the engine via just one mutex/cond and you could make it close to HSpeak.
I presented the Python/PLY compiler idea instead of the above because I'm very afraid of committing to anything that requires touching the engine, but you can certainly do it.
Anyway, it's nice to know that things are moving on the scripting front. I'll make some improvements to the AST generator as soon as I can get back into "that state of mind" and maybe it will be useful for something eventually.
It would be nice to have the option of a REPL as a drop-down console within Game's window. Usability would inevitably be worse though. It could still spawn a separate process behind the scenes.
Another consideration is that I plan to replace the script debugger, which is horrible in every way. I want to move the debugger to a separate process/program for the same reasons. It needs to be able to display a lot of text readably (eg source code) with a good UI. The IPC to query and modify interpreter state would be tricky. Of course it could include a REPL too. I'm actually seriously considering writing it in JS and running in the user's webbrowser, embedding a trivial HTTP server in Game. That's one way to avoid dependencies! Alternatives are to extend HamsterWhisper or, most realistically, to use/adapt an existing debugger for the underlying VM (eg Zerobrane for Lua... which is one of the bigadvantages of using Lua. Squirrel only has Eclipse and VS-based debuggers, yeech)
1-based array indices aren't my favourite but that detail doesn't matter at all when you're translating to Lua since it only affects table literal syntax, you can use 0-based arrays by typing 5 extra characters. But it would be an uphill battle to prevent all of lua's many weird semantics creeping into HS. And it's impractical to disallow "1" + 2 = 3 without either patching the Lua VM or using just the upcoming Lua 5.4 which removes that misfeature.
Lots of decisions to make. Good to rapidly prototype this stuff before committing to anything.
Another consideration is that I plan to replace the script debugger, which is horrible in every way. I want to move the debugger to a separate process/program for the same reasons. It needs to be able to display a lot of text readably (eg source code) with a good UI. The IPC to query and modify interpreter state would be tricky. Of course it could include a REPL too. I'm actually seriously considering writing it in JS and running in the user's webbrowser, embedding a trivial HTTP server in Game. That's one way to avoid dependencies! Alternatives are to extend HamsterWhisper or, most realistically, to use/adapt an existing debugger for the underlying VM (eg Zerobrane for Lua... which is one of the bigadvantages of using Lua. Squirrel only has Eclipse and VS-based debuggers, yeech)
1-based array indices aren't my favourite but that detail doesn't matter at all when you're translating to Lua since it only affects table literal syntax, you can use 0-based arrays by typing 5 extra characters. But it would be an uphill battle to prevent all of lua's many weird semantics creeping into HS. And it's impractical to disallow "1" + 2 = 3 without either patching the Lua VM or using just the upcoming Lua 5.4 which removes that misfeature.
Lots of decisions to make. Good to rapidly prototype this stuff before committing to anything.
Last edited by TMC on Sat Feb 29, 2020 11:49 pm, edited 5 times in total.
I've implemented some form of automatic line continuation. So now you can do this:
If an error is detected at EOF it goes into line continuation mode until the error disappears or until it isn't located at the end of the buffer. I'm not sure it's the right strategy, but it's interesting.
Code: Select all
HSpeak> a(
.... > 10, 40
.... > , 35 + 2
.... > )
return
function: a
number: 10
number: 40
binop: +
number: 35
number: 2
At first I thought that wouldn't be right, but actually I can't think of anything wrong with that. It's much better than just counting brackets since it continues on from "x := 1 +"
I've started on extending hspeak.ast.py (with hspeak-defines.py) to produce lone .hsz's. Not handling any flow control. Just want to do enough to call a single builtin function, and then pass that to REPL. I have no idea yet how to handle local variables in the REPL.
I also did some work on launching an external REPL in a terminal emulator.
I've started on extending hspeak.ast.py (with hspeak-defines.py) to produce lone .hsz's. Not handling any flow control. Just want to do enough to call a single builtin function, and then pass that to REPL. I have no idea yet how to handle local variables in the REPL.
I also did some work on launching an external REPL in a terminal emulator.
There is an example in PLY that does line continuation by counting parenthesis. The advantage is that you don't have to re-parse the whole line every time a continuation happens, so it would be faster.
Consider this Python example:
Notice that altough "del" looks like a regular function call, it's really an operator because it has to receive a pointer to the array, not to the value of the array element. That's the reason I had to single out specific functions and specific arguments in my interpreter attempt.
Consider this statement in HSpeak:
You want the first argument to be reduced to a number, but you want a literal of the second argument where no attempt is made to de-reference it, because you need it as the key in the "global scope" dictionary. Same goes for:
This time the literal argument becomes the key in the "local scope" dictionary. That's about it.
I got discoraged and moved on to AST generation when I realized that an interpreter and a compiler would be two separate projects with possibly only the AST in common.
Anyway, that's my story and sticking to it.
----
A real life if/then:
Also nested parenthesis, a ',' at the end of a list and the empty list are accepted:
I don't know if this is has anything to do with what you're thinking but I just wanted to add some thought to the byval/byref issue that I encountered earlier.I have no idea yet how to handle local variables in the REPL.
Consider this Python example:
Code: Select all
>>> a = [0, 1, 2, 3]
>>> del(a[1])
>>> a
[0, 2, 3]
Consider this statement in HSpeak:
Code: Select all
global variable(10 + a++, some global variable)
Code: Select all
variable(a, b, c, d)
I got discoraged and moved on to AST generation when I realized that an interpreter and a compiler would be two separate projects with possibly only the AST in common.
Anyway, that's my story and sticking to it.
----
A real life if/then:
Code: Select all
if(death count >= 1) then(
.... > set tag(tag:save scummer, ON)
.... > set tag(tag:save game exists, OFF)
.... > )
if
expression group
binop: >=
name: deathcount
number: 1
expression group
function: settag
name: tag:savescummer
name: ON
function: settag
name: tag:savegameexists
name: OFF
Code: Select all
HSpeak> (10, (4, 1,), 5, (), 4 + 2)
return
expression group
number: 10
expression group
number: 4
number: 1
number: 5
expression group
binop: +
number: 4
number: 2
Last edited by lennyhome on Wed Mar 04, 2020 7:11 am, edited 4 times in total.
Those AST improvement look nice.
Success! I was able to produce a hsz file containing a simple script (as printed by decompile.py from nohrio):
and I imported it into a game and it worked.
I just tacked my code onto hspeak_ast.py. Here's the patch
http://tmc.castleparadox.com/ohr/hsz_compiler.patch
And here's a complete copy of the source -- only hspeak_ast.py is changed, but I included my hspeak_defines.py too, and also a test_compile.sh script which packages the out.hsz file produced by hspeak_ast.py into an .hs file and also runs it through decompile.py to verify it.
http://tmc.castleparadox.com/ohr/hsz_compiler.tar.gz
Yes, an interpreter and compiler don't share much aside from the AST, but building the AST is almost a whole HS compiler. And the overlap is even higher if it's an "optimising" compiler which evaluates constant expressions (which HSpeak does do, but nothing more).
The problem with local variables in a REPL is that I was planning on packaging up each line of input into an .hsz file and running it through the interpreter. Of course that means you can't save local variables. Either I need to drop that idea, or... I could use nonlocal variables instead and store the variabels in the parent scope between executions. But even that would still need some changes to the interpreter, so I might as well go with a less convoluted solution.
Turning variable names into variable references (ID numbers) has to happen relatively late if "variable()" us allowed to appear after a variable is used, so it becomes necessary to keep unresolved tokens in the AST whenever variables are used.
Success! I was able to produce a hsz file containing a simple script (as printed by decompile.py from nohrio):
Code: Select all
Script dummy()
do(
return(showvalue(1000 * 3 + 9))
)
I just tacked my code onto hspeak_ast.py. Here's the patch
http://tmc.castleparadox.com/ohr/hsz_compiler.patch
And here's a complete copy of the source -- only hspeak_ast.py is changed, but I included my hspeak_defines.py too, and also a test_compile.sh script which packages the out.hsz file produced by hspeak_ast.py into an .hs file and also runs it through decompile.py to verify it.
http://tmc.castleparadox.com/ohr/hsz_compiler.tar.gz
Yes, an interpreter and compiler don't share much aside from the AST, but building the AST is almost a whole HS compiler. And the overlap is even higher if it's an "optimising" compiler which evaluates constant expressions (which HSpeak does do, but nothing more).
The problem with local variables in a REPL is that I was planning on packaging up each line of input into an .hsz file and running it through the interpreter. Of course that means you can't save local variables. Either I need to drop that idea, or... I could use nonlocal variables instead and store the variabels in the parent scope between executions. But even that would still need some changes to the interpreter, so I might as well go with a less convoluted solution.
Turning variable names into variable references (ID numbers) has to happen relatively late if "variable()" us allowed to appear after a variable is used, so it becomes necessary to keep unresolved tokens in the AST whenever variables are used.
Last edited by TMC on Wed Mar 04, 2020 2:10 pm, edited 4 times in total.
At a glance it's almost exactly what I would have done myself in 6 months or so probably. Very nice. I'll be examining it more closely.I just tacked my code onto hspeak_ast.py
You can have a REPL-local scope of variables. You just need to be able to send in a compiled mini-script for immediate execution and to be able to read a return value from the mini-script. I don't know if the engine currently allows that, but that would be the only change needed to fake an interpreter.
Last edited by lennyhome on Wed Mar 04, 2020 5:09 pm, edited 1 time in total.
I need to implement immediate execution of scripts anyway, so I'll be doing that.
I forgot to say: I won't be doing any more work on that hsz compiler for a while, at least a week, so I hope you'll take the code and integrate it into what you've got, and free to make changes/extend it. I'm too busy, and when I have more time I'll work on the engine side of implementing a REPL instead.
I forgot to say: I won't be doing any more work on that hsz compiler for a while, at least a week, so I hope you'll take the code and integrate it into what you've got, and free to make changes/extend it. I'm too busy, and when I have more time I'll work on the engine side of implementing a REPL instead.
I've updated my package here (same link as before) and it would be a good time to get it fresh because I did some major refactoring and I've integrated your code.
I've added "baconthulhu.txt" which is the result of an attempt to batch compile every script in "baconthulhu.hss". It should give you a good idea of where we're standing. The performance of interpreted Python is adequate at around 1k lines per second.
Your code generator is in "hspeak_gen.py". You can run/test it via "hspeak_ast.py" like:
The result of the code generation isn't written anywhere, it just calls it and prints errors.
I've added "baconthulhu.txt" which is the result of an attempt to batch compile every script in "baconthulhu.hss". It should give you a good idea of where we're standing. The performance of interpreted Python is adequate at around 1k lines per second.
Your code generator is in "hspeak_gen.py". You can run/test it via "hspeak_ast.py" like:
Code: Select all
HSpeak> noop(), 2 + 3
root
function: noop
binop: +
number: 2
number: 3
toHSZ: 110 bytes
Wow, a lot of progress here.
Could you slap a license on it? (BTW, we're going to attempt to relicense the OHRRPGCE to something more permissive than the GPL)
Is the sigWARNING: 2 shift/reduce conflicts
"if(...)\nthen(...)" gets converted to "if(...),then(...)" which produces a syntax error. Of course this could be fixed by adding a check for ')' here:
However it's quite common to see people put a comma between a ")" and the next token even before "then", "else" and "do, so I think I it's be better to keep that. Not just in the case of two statements (which your parser allows): "show textbox(1), wait for textbox".
I think it's reasonable to disallow "then,()" (which HSpeak currently allows) but allow "then\n()", but it might be a pain to distinguish them. Doesn't really matter.
HamsterSpeak used to be practically comma-insensitive (you could add extra commas between any tokens) until a couple of years ago (version 3Se). Now a lot of extra commas are disallowed, though I want to tighten it even more (look at testgame/parser_tests.hss if you're interested. But please don't take it as authoritative). I'm surprised you have the following rule (HSpeak doesn't allow it):
It does allow x(1,) though, which you also have; I wasn't sure whether to disallow that one. Better not to change the laxness without a good reason.
"if(...)else(...)" is also valid in HS. And so is a "do" block not following a for/while/switch. They are useful because they can contain "break" and "continue" statements. BTW, "break" and "continue" take an optional argument: the number of scopes to break out of. Amusingly, it can be an expression.
(And, I see I was totally wrong to talk about associating 'if' and 'then' block using AST manipulations rather than simply defining production rules)
1k lines per second? That would be pretty slow, but it's much faster than that for me (maybe you were talking about a previous version?): 1.1s for baconthulhu.hss (vs 0.42s for HSpeak). Excluding plotscr.hsd and scancode.hsi (note: both plotscr.hsd and baconthulhu.hss include scancode.hsi), it's 0.60s for 4237 lines, so 7kLoC/s. That's fine. I see about half of the time is spent in yacc, about a quarter in lex. I guess there's not much room to speed it up while sticking to PLY, maybe the only real way to do so is to handle whitespace in the lexer somehow rather than in the parser, to get rid of the name_concat production rules and reduce the amount the parser needs to do. Similarly, HSpeak actually removes all commas and newlines in the lexer, which is cheating but means there's far less to parse.
Another example: On a copy of Entrepreneur's scritps (307kLoC including plotscr.hsd - the size of the scripts have since been reduced), HSpeak takes 22.5s and hspeak_tld.py takes 42.7s. (It took HSpeak just 2.6s to lex)
Could you slap a license on it? (BTW, we're going to attempt to relicense the OHRRPGCE to something more permissive than the GPL)
Is the sigWARNING: 2 shift/reduce conflicts
"if(...)\nthen(...)" gets converted to "if(...),then(...)" which produces a syntax error. Of course this could be fixed by adding a check for ')' here:
Code: Select all
s1 = s1.rstrip()
if s1[-1] != ',' and s1[-1] != '(':
s1 += ','
I think it's reasonable to disallow "then,()" (which HSpeak currently allows) but allow "then\n()", but it might be a pain to distinguish them. Doesn't really matter.
HamsterSpeak used to be practically comma-insensitive (you could add extra commas between any tokens) until a couple of years ago (version 3Se). Now a lot of extra commas are disallowed, though I want to tighten it even more (look at testgame/parser_tests.hss if you're interested. But please don't take it as authoritative). I'm surprised you have the following rule (HSpeak doesn't allow it):
Code: Select all
expression_group : '(' ',' ')'
"if(...)else(...)" is also valid in HS. And so is a "do" block not following a for/while/switch. They are useful because they can contain "break" and "continue" statements. BTW, "break" and "continue" take an optional argument: the number of scopes to break out of. Amusingly, it can be an expression.
(And, I see I was totally wrong to talk about associating 'if' and 'then' block using AST manipulations rather than simply defining production rules)
1k lines per second? That would be pretty slow, but it's much faster than that for me (maybe you were talking about a previous version?): 1.1s for baconthulhu.hss (vs 0.42s for HSpeak). Excluding plotscr.hsd and scancode.hsi (note: both plotscr.hsd and baconthulhu.hss include scancode.hsi), it's 0.60s for 4237 lines, so 7kLoC/s. That's fine. I see about half of the time is spent in yacc, about a quarter in lex. I guess there's not much room to speed it up while sticking to PLY, maybe the only real way to do so is to handle whitespace in the lexer somehow rather than in the parser, to get rid of the name_concat production rules and reduce the amount the parser needs to do. Similarly, HSpeak actually removes all commas and newlines in the lexer, which is cheating but means there's far less to parse.
Another example: On a copy of Entrepreneur's scritps (307kLoC including plotscr.hsd - the size of the scripts have since been reduced), HSpeak takes 22.5s and hspeak_tld.py takes 42.7s. (It took HSpeak just 2.6s to lex)
Last edited by TMC on Thu Mar 12, 2020 12:11 pm, edited 8 times in total.