I have an idea for how to handle commas and newlines, and () vs begin/end, although handling both those things at once adds a lot of extra complication because of the necessary commas around begin/end so the. It won't be 100% the same as HSpeak, but stricter (which is a good thing), and probably close enough for 95+% of games.
But maybe I'm wasting time figuring out convoluted ways to replicate HSpeak's strange lexing/comma handling if I'll end up writing a custom lexer for PLY (or modifying PLY's lex) or switching to lark, which has a "contextual lexer":
Which sounds very convenient: in some places newlines could be treated like commas, and in others they could be ignored.The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows to parse languages that LALR(1) was previously incapable of parsing.
I was disappointed and very skeptical about how PLY (and yacc) handles syntax error reporting, using p_error and 'error' symbols. What I really want, and which PLY doesn't provide, is a description of what it was expecting to see at the point of the error, like HSpeak provides in many of its error messages.
But on actually trying out writing 'error' rules, it's not as bad as I thought.
However it seems it's necessary to add a rule containing 'error' at the exact location of the error in order to explain exactly what's wrong, e.g. an extra comma inside an "if()" rather than just printing an error like "Expected condition after IF". But I think I can use a combination of p_error to point out the token where the error occurred with error rules to describe the general context. A lot of rules might be needed, but for comparison HSpeak has roughly 180 different warning and error messages.
Also, it's possible to examine the parser's symbol stack which might be helpful.
Interestingly, lark takes a completely different approach to error reporting based on pattern matching. But error recovery isn't even mentioned in lark's documentation (which is sparse compared to PLY's)
https://github.com/lark-parser/lark/blo ... ng_lalr.py