Indent tool for HSpeak

Talk about things that are not making games here. But you should also make games!

Moderators: Bob the Hamster, marionline, SDHawk

lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Indent tool for HSpeak

Post by lennyhome »

I've been toying with this idea for the last couple of days. It not ready yet (if ever) but the last part of this post is kind of a demo of what it does up to now. The spacing, the indentation and the coloring are all automatic.

I have no idea if a tool like this already exist. I did it mostly as an excuse to study Python and the Ply library but if anybody is interested I will publish the source code.

I had to use underscores to represent spaces at the beginning of the line because of the forum, those shouldn't be there.

----

script, create armada, begin
____if (armada) then (
________set slice x(armada, 0)
________set slice y(armada, 0)
____) else (
________armada := create container
____)
____variable(sl, row, col)
____for (row, 0, 3) do (
________for (col, 0, 7) do (
____________sl := load walkabout sprite(2 + col)
____________set parent(sl, armada)
____________set sprite frame(sl, 4)
____________place sprite(sl, col * 25, row * 25)
________)
____)
____refit armada
____direction := 1
____start speed += 1
____movespeed := start speed
____enemy fire rate += 1
end
script, create bullet layers, begin
____friendly bullets := create container(320, 200)
____enemy bullets := create container(320, 200)
end
script, setup score, begin
____show string at(1, 0, 0)
____$1 = ""
____append number(1, score)
end
script, update game, begin
____user input
____move armada
____update each enemy
____update shooter
____update bullets
end
script, user input, begin
____if (key is pressed(key:ESC)) then (
________playing := false
____)
____if (key is pressed(key:left)) then (
________set slice x(shooter, slice x(shooter) -- 4)
____)
____if (key is pressed(key:right)) then (
________set slice x(shooter, slice x(shooter) + 4)
____)
____if (keyval(key:space) >> 1) then (
________fire bullet
____)
end
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Very nice! That's amusing, because I did the same thing long ago -- I wrote a Python program for indenting and syntax highlighting HamsterSpeak in order to get better at Python! It might have been the first significant thing I wrote in Python. It outputted HTML and used pure Python without any libraries. I don't want to discourage you. You'd probably be able to print errors much better than I did, because of PLY.

Over the years I suggested my indenter to a few people to figure out "mismatched parentheses" compile errors they were struggling with, but it was extremely obscure. I really should have put it on the wiki. If you can do better than me, it definitely would be useful to people. And since HamsterWhisper is written in Python, one of our autoindent tools could be included in it. Funny, I never thought of that before...
Last edited by TMC on Sat Feb 22, 2020 11:44 am, edited 2 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

Ok, so, using Ply, the following is the whole lexer I wrote. I believe it covers most of the syntax, except for some rare cases such as where "else if" is written with 2 spaces in the middle.

I'm not sure how far I want to go but Ply offers much more and it's surprisingly fast. It would definitely be useful if added to Hamster Whisper for syntax highlight. I dare say it could even be evolved into a compiler and a formal definition for the language.

Code: Select all

reserved = {
	'else if': 'ELSE_IF',
	'case': 'CASE',
	'plotscript': 'PLOTSCRIPT',
	'exit': 'EXIT',
	'for': 'FOR',
	'switch': 'SWITCH',
	'while': 'WHILE',
	'do': 'DO',
	'else': 'ELSE',
	'end': 'END',
	'then': 'THEN',
	'script': 'SCRIPT',
	'if': 'IF',
}

tokens = [
    'LESS_EQUAL', 'PLUS_EQUAL', 'MINUS_MINUS',
    'ASSIGN', 'NUMBER', 'STRING_PLUS',
    'EQUAL_EQUAL', 'BOOL_AND', 'MINUS_EQUAL',
    'EXP_EXP', 'STRING_EQUAL', 'LESS_MORE',
    'SHIFT_LEFT', 'NAME', 'MORE_EQUAL',
    'BOOL_OR', 'COMMENT', 'SHIFT_RIGHT',
    'STRING',
] + list(reserved.values())

literals = (
    '+', '-', '*', '/',
    '<', '>', '$', '^', '@',
    ',', '&#40;', '&#41;', '=', '&#58;',
    '.', '?',
&#41;

# Tokens

t_COMMENT = r'\#.*'
t_PLUS_EQUAL = r'\+='
t_MINUS_EQUAL = r'-='
t_STRING_PLUS = r'\$\+'
t_STRING_EQUAL = r'\$='
t_MINUS_MINUS = r'--'
t_EXP_EXP = r'\^\^'
t_EQUAL_EQUAL = r'=='
t_LESS_MORE = r'<>'
t_SHIFT_RIGHT = r'>>'
t_SHIFT_LEFT = r'<<'
t_LESS_EQUAL = r'<='
t_MORE_EQUAL = r'>='
t_ASSIGN = r'&#58;='
t_BOOL_AND = r'&&'
t_BOOL_OR = r'\|\|'
t_NUMBER = r'-?\d+'
t_STRING = r'\"&#40;&#91;^\\\n&#93;|&#40;\\.&#41;&#41;*?\"'

def t_NAME&#40;t&#41;&#58;
    r'&#91;a-zA-Z_&#93;&#91;a-zA-Z0-9_ &#93;*'
    t.value = t.value.rstrip&#40;&#41;
    t.type = reserved.get&#40;t.value, 'NAME'&#41;
    return t

def t_newline&#40;t&#41;&#58;
    r'\n'
    t.lexer.lineno += 1

def t_error&#40;t&#41;&#58;
    print&#40;"Illegal character '%s' at line %d" % &#40;t.value&#91;0&#93;, t.lineno&#41;&#41;
    t.lexer.skip&#40;1&#41;

t_ignore = ' \t'

# Build the lexer
import ply.lex as lex
lexer = lex.lex&#40;&#41;
Last edited by lennyhome on Sat Feb 22, 2020 7:21 pm, edited 1 time in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Interesting, does it read globals to find t_COMMENT, etc?

Source code for my tool is at http://tmc.castleparadox.com/format-hs.py and you can try it out here.
Last edited by TMC on Sat Feb 22, 2020 11:38 pm, edited 1 time in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

Yes. PLY is meant to be like flex/yacc, but instead of pre-processing the .y files to .c files, you first define some stuff, then import the library and it does reflecton on the module and it builds the parser.

I've tried your format-hs.py. It really does almost the exact same thing but I would really like to avoid having to write parser code by hand because I don't have the patience for that.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Yeah, my parser is a mess (at least, I assume it is, I don't have the patience to inspect the code myself) and would be a pain to extend as features continue to be added to HS, such as array subscripting, and poetntial changes to how commas and brackets are lexed. PLY looks pretty nice! I assume that since it uses regexes, it can handle HS's unique complete whitespace insensitivity? E.g. you can write + = instead of += if you really want.
Last edited by TMC on Sun Feb 23, 2020 4:24 am, edited 1 time in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I've noticed the "you can put one or more spaces in between two letter operators" feature. There is some code in the compiler to deal with that explicitly. It's not a problem. A rule like ":[ \t]*=" takes precedence over the literal ":" and the literal "=" because the rule is longer than the literal.

I've also noticed that the Lexer class you've used must have been derived from PLY because strangely enough the token class has the same members (.type, .value, .lineno, ...).

I've started writing some yacc rules and I hope to get some sort of stand alone interpreter eventually, but mostly I'm having fun refreshing my memory on the last time I've used flex/bison, which was probably 150 years ago.

----

After a bit of hacking on the yacc rules for the (future) interpreter I got this:

Code: Select all

HSpeak> death count &#58;= 0
Statement assign&#58; &#91;None, 'death count', '&#58;=', 0&#93;
HSpeak> write global&#40;1003, get up stairs x&#40;current map&#41;&#41;
Function call&#58; &#91;None, 'get up stairs x', '&#40;', 'current map', '&#41;'&#93;
Function call&#58; &#91;None, 'write global', '&#40;', &#91;1003, None&#93;, '&#41;'&#93;
Result&#58; &#91;None, None&#93;
HSpeak>
It can also solve simple expressions and store/recall global variables. No looping constructs yet. I know it's very not right, but I think I could make it do something useful eventually.

----

And here is some code. Requires Python3 but PLY is already included so nothing else should be needed. You can run it and feed it one line at the time and see how it breaks it down. It's still very rought but it's interesting.
Last edited by lennyhome on Mon Feb 24, 2020 1:34 am, edited 2 times in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

That's the start of something very cool! And the code is very simple. Amused to see you have a unary minus operator, which HS sadly doesn't have. (I'd really like to allow - instead of -- too, but any attempt to allow that would probably make a mess of the syntax... syntax conditional on a directive?).

zzo38 wrote a HamsterSpeak interpreter in node.js which ran compiled .hs files by producing and eval'd javascript code. It was a messy hack but worked well enough that I could write implementations of a few basic script commands, and create a commandline program to execute HamsterSpeak scripts, including our HS testcases (actually, that's the only thing I ever ran through it). You might be interested to see which commands are needed to be able to run the (quite incomplete) runtime testcases. More relevantly, we also have tests for the HSpeak parser.

However, I'd be more interested in a HS REPL than a utility for executing scripts. I've wanted a REPL for HS for a long time, but adding a REPL to Game, piping to HSpeak to parse input line by line and hand it back to Game in suitable form seems terribly difficult. Putting the REPL itself in HamsterWhisper seems far more reasonable!

No, the Token class wasn't part of the Lexer class which I got here. I wrote everything else, including Token, myself. Those are just obvious member names to use!
Last edited by TMC on Mon Feb 24, 2020 3:22 pm, edited 2 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

Part of the motivation would be the fact that Euphoria is going to eventually bit-rot. The ARM version hasn't been touched in 10 years and has never been made official. Also the compiler, altough it's just one file, it's really complex.

I need to take a look at the AST and bytecode generation part and get an idea of how hard would it be to reproduce it. Would it be possible to emit bytecode straight from the parser rules? I have no idea yet, but if I get something done I'll post it here for sure.

----

I've tweaked the rules to let the parser take care of the two letter operators instead of the lexer and as a result you can do this:

Code: Select all

HSpeak> a b &#58; = 10 + 4 + 8 / 2
Number assign&#58; &#91;None, 'ab', '&#58;', '=', 18&#93;
HSpeak> a b
Return value&#58; &#91;None, &#91;18&#93;&#93;
HSpeak> a b + = 234
HSpeak> a b
Return value&#58; &#91;None, &#91;252&#93;&#93;
HSpeak> a b - = 251
HSpeak> a b
Return value&#58; &#91;None, &#91;1&#93;&#93;
HSpeak>
With or without spaces in the middle.
Last edited by lennyhome on Tue Feb 25, 2020 1:59 am, edited 1 time in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Was it helpful to handle the whitespace in the parser rather than using regexes in the lexer? (We try to preserve whitespace and capitalisation in identifiers when printing error messages).
It's not *actually* necessary to support whitespace in a middle of an operator token, with the exception of $ s = "foo" and $ s + "bar" (not a normal operator, but lexing that is quite a special case in the code). I've never seen one actually put whitespace in any of the other operators, so it's a (mis)feature that's easily dropped. Not allowing "x - -3" could even be a good thing.

Actually, there's an ARMv6hf build of 4.1.0 (2015) here and it looks like ARM support is in the master branch... but what is or isn't official is always a question with euphoria.

Yeah... Euphoria development is very slow and is already a pain to have as a dependency. But an even better reason to rewrite it is that the language is limiting and the code is hard to read, write and extend. It's too complex and could easily be half the length if rewritten. The fact that Euphoria doesn't let you pass a reference to an object to a function -- it's strictly pass by value (CoW) -- is inefficient and inconvenient for manipulating ASTs. Not the mention the lack of object.member syntax and of structures which aren't sequences (though Euphoria now allows preprocessors and I was seriously considering using one that converts object.member or object[MEMBER]).

James once suggested rewriting HSpeak but it was too hard to justify. Actually, James already rewrote HSpeak once (version 1 was written in QuickBasic! The fact that it was < 2000 LoC in QB and 5300 LoC in Eu is pretty damning!). If a serious effort to rewrite starts, I would be very enthusiastic to help! I want to add a *lot* of new features to HS, and adding them to the current code will be painful.

An .hs file is a lumped (like .rpg) collection of files: an .hsz file for each script plus some other metadata. .hsz files don't actually contain bytecode, they contain serialised ASTs plus a table of string constants. The .hsz file format is easy to serialise and deserialise (possibly excepting the strings). So you would output while walking the AST anyway.

However, beware that some AST transformations before serialising are necessary. For example "else" becomes a child of "if" and "elseif" gets converted to "else(if". There are also two "hardcoded macros": "tracevalue" and "assert". I suppose you *could* do all that from within parsing rules, as long as you can backtrack and rewrite already parsed parts of the AST.

And you do need at least three passes to parse HS anyway. First to parse "define constant" (and also "define trigger"**), since constants can be declared after use in top-level declarations; second to parse other top-level declarations (e.g. "script", "global variable"); finally to parse the body of each script. And you need to do two passes over script bodies also, as "variable" and "subscript" can occur after a variable or subscript is actually used, one of those passes could be combined with a previous pass. Note also that this means you have to parse a subscript after the parent script, because it can refer to variables in the outer scope which are declared afterwards.

**You'll be surprised that "script" and "plotscript" aren't reserved words and don't have any builtin meaning in HSpeak. They're declared by "define trigger". However since this is a (probably misguided) implementation detail only, it can be trivially changed, there's no requirement that any other HS compiler processes "define trigger" instead of hardcoding. But we will add more script types/triggers in future.
Last edited by TMC on Tue Feb 25, 2020 4:25 am, edited 5 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

Was it helpful to handle the whitespace in the parser rather than using regexes in the lexer
Makes it simpler but also generates a shift/reduce conflict because + is declared left associative, so I put them back.
there's an ARMv6hf build of 4.1.0 (2015) here
Yes, that's the only one that works and can compile itself on the Raspberry Pi. If you want to see something interesting, open "be_callc.c" of the ARM version.
beware that some AST transformations before serialising are necessary
PLY's manual suggests a very simple way to generate ASTs, so that shouldn't be a problem. I'm just trying to avoid it.

My concern right now is with byref/byval parameter passing in function calls. The way I've set it up is that parameters are always passed byval except in some cases:

Code: Select all

def p_statement_global_variable&#40;p&#41;&#58;
    "statement &#58; GLOBAL_VARIABLE '&#40;' expression ',' name_concat '&#41;'"

def p_function_call&#40;p&#41;&#58;
    "expression &#58; name_concat '&#40;' expression_list '&#41;'"
The first rule intercepts function calls to "global variable" specifically and passes the name of the variable as the second parameter. The second rule is the default function call, which passes everythong by value. Is that how it's supposed to work?

After all I'm trying to write a compiler for a language and a virtual machine I know almost nothing about. What could possibly go wrong?

----

On a second thought I can answer myself with a no. Anyway I've updated the package with what I've written so far and I believe that somebody who actually knows what he's doing could make good use of it.

----

I don't know how much further I want to go down this line becuase it is way too much work for me but I've had another thought which cold be helpful. It's about the define blocks like:

Code: Select all

define function, begin
0,noop,0                    # no operation
1,wait,1,1                  # wait&#40;cycles&#41;
2,waitforall,0              # wait for script-related walking&panning to stop
3,waitforhero,1,0           # wait for hero to stop moving
4,waitfornpc,1,0            # wait for npc to stop moving
...
Instead of trying to parse them, one could just turn them into Pyton code with some string manipulation like:

Code: Select all

hspeak_function = &#123;
    "noop"&#58; &#40;0, 0&#41;,
    "wait"&#58; &#40;1, 1, 1&#41;,
    "waitforall"&#58; &#40;2, 0&#41;,
    "waitforhero"&#58; &#40;3, 1, 0&#41;,
    "waitfornpc"&#58; &#40;4, 1, 0&#41;,
....
And import them directly into the compiler as a Python module.
Last edited by lennyhome on Thu Feb 27, 2020 12:37 am, edited 2 times in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

The interpreter is looking not that far off functional! I'd like to take it further even if you're stopping here.
lennyhome wrote:Makes it simpler but also generates a shift/reduce conflict because + is declared left associative, so I put them back.
To be honest, I've never studied or used LR parsers, though I've always wanted to learn. I've only written/used hand-written operator-precedence and recursive-descent, and parsing-expression-grammar (PEG) parsers. (PEG parsers (eg reloadbasic.py) are for lazy people who don't want to deal with rule conflicts and don't care about bad performance due to unlimited backtracking.)

I notice that you have a rule "name_concat : name_concat ':' NAME" to allow : in the middle of an identifier because of the conflict with :=. But it doesn't allow : at the beginning or end of an identifier... oh wait... I thought it would, but HSpeak doesn't accept that either, usually (depending on what follows), and throws weird error messages. Well, I (re)wrote HSpeak's lexer, and it has nasty exceptions (especially to handle -), so am to blame for that mess. Actually, HSpeak does have a number of lexer and parser bugs. It was even worse before the lexer rewritten!

So switching to a strictly defined lexer would be a relief. But is there another way to handle characters like : and - in identifiers without extra productions?

We do also allow a lot of other characters in identifiers, in fact anything that isn't explicitly disallowed. I know people use ! and ? and & and % in script names too. I was planning to restrict the allowable characters further.
The first rule intercepts function calls to "global variable" specifically and passes the name of the variable as the second parameter. The second rule is the default function call, which passes everythong by value. Is that how it's supposed to work?
Looks good to me.

Though "statement : '$' name_concat '=' STRING" is wrong, after the $ is an arbitrary expression*, which evaluates to the string ID, not a variable name.

* With the exception that $...="..." and $...+"...", which are themselves expressions returning the string ID, aren't allowed.
Instead of trying to parse them, one could just turn them into Pyton code with some string manipulation
I see you've updated your upload to include a script to convert plotscr.hsd into Python. Thanks, this will save me some time! I needed something just like this for a couple reasons: I want to change the "define function" syntax, and I'm not doing it by hand, and relatedly I want to generate a FreeBasic include file from the "define function" block.
Last edited by TMC on Thu Feb 27, 2020 1:24 pm, edited 2 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I'd like to take it further even if you're stopping here.
You're welcome because I need time to familiarize with the engine.

For now I was able to unlump an .hs file. I'll look up those file names in the compiler source code and figure out what functions I need to import from it.
Last edited by lennyhome on Thu Feb 27, 2020 8:02 pm, edited 1 time in total.
TMC
Metal King Slime
Posts: 4308
Joined: Sun Apr 10, 2011 9:19 am

Post by TMC »

Hmm, you want to read an .hs file? So that the REPL can call existing scripts?
BTW, .hs files do not currently contain lists of declared global variables, but that's something I want to add.

A Python library to read .hs (and .rpg) files already exists:
https://bitbucket.org/rbv/nohrio/src/ma ... scripts.py
https://rpg.hamsterrepublic.com/ohrrpgce/Nohrio
Here's a simple script decompiler commandline utilitiy: https://bitbucket.org/rbv/nohrio/src/ma ... compile.py
nohrio is written in Python2 and built on Numpy. Unfortunately it's forked in two. David no longer works on it, so I need to take over maintenance and merge our forks, which is a daunting task. I've put it off for 7 years :/
scripts.py doesn't depend on the rest of nohrio, except for the unlumper, so it can be pulled out, ported to Python 3, and reused.

I should add a protocol for sending compiled script snippets to ohrrpgce-game for execution and passing back return values!
Last edited by TMC on Thu Feb 27, 2020 10:08 pm, edited 3 times in total.
lennyhome
Slime Knight
Posts: 115
Joined: Fri Feb 14, 2020 6:07 am

Post by lennyhome »

I'm still unsure where I want to go but I tried the AST stuff as suggested in the PLY manual and I got this real life example:

Code: Select all

HSpeak> v &#58;= read map block&#40;x, y, layer&#58;base&#41; + read map block&#40;x, y, layer&#58;item&#41; * 256 + read map block&#40;x, y, layer&#58;fog&#41; * 65536 + read map block&#40;x, y, layer&#58;chalk&#41; * 16777216
ROOT
assign number
    name&#58; v
    binop&#58; +
        binop&#58; +
            binop&#58; +
                function&#58; readmapblock
                    name&#58; x
                    name&#58; y
                    name&#58; layer&#58;base
                binop&#58; *
                    function&#58; readmapblock
                        name&#58; x
                        name&#58; y
                        name&#58; layer&#58;item
                    number&#58; 256
            binop&#58; *
                function&#58; readmapblock
                    name&#58; x
                    name&#58; y
                    name&#58; layer&#58;fog
                number&#58; 65536
        binop&#58; *
            function&#58; readmapblock
                name&#58; x
                name&#58; y
                name&#58; layer&#58;chalk
            number&#58; 16777216
It's really quite nice. I'll update my sources shortly.
Hmm, you want to read an .hs file?
No. I just wanted to see the file names to cross-refernce them with the compiler source code.
Last edited by lennyhome on Fri Feb 28, 2020 5:13 am, edited 2 times in total.
Post Reply