Multistate Lexer

Multistate lexers are used to switch lexical analysis modes "on the fly", for example when a grammar combines several different languages with different (including conflict ones, which are present in both grammars) tokens.

Please note that this mode can significantly reduce the performance of lexical analysis. If additional parsing of an existing token is required (for example, ASCII-sequences inside strings), it is recommended to use additional parsing of the existing token instead of multistate.

Before creating a multistate lexer, you should create two different lexers, each of which will belong to its own category of "languages" which will then be combined into a common multistate lexer.

$lexer = new \Phplrt\Lexer\Multistate([
    'html' => new \Phplrt\Lexer\Lexer([ ... ]),
    'php' => new \Phplrt\Lexer\Lexer([ ... ])
]);

After this, you should configure the transition rules.

For example:

  • If the token <?php was found in the html lexer...
    • Then we should move to the php lexer.
  • If the php lexer contains the ?> token...
    • Then we go back to the previous state.
$lexer = new \Phplrt\Lexer\Multistate(
    states: [
        'html' => new \Phplrt\Lexer\Lexer([ ... ]),
        'php' => new \Phplrt\Lexer\Lexer([ ... ])
    ],
    transitions: [
        'html' => [ 'T_PHP_OPEN'  => 'php'  ],
        'php'  => [ 'T_PHP_CLOSE' => 'html' ],
    ]
);