Unknown Token

The Unknown token denotes any unknown tokens. By default, at such moments, lexing stops with a lexing error.

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
);

foreach ($lexer->lex('42 unknown') as $token) {
    echo $token->getName() . "\n";
}

// T_DIGIT
// Uncaught Phplrt\Lexer\Exception\UnrecognizedTokenException: Syntax error, unrecognized " unknown"

Behaviour Change

You can override this behavior by specifying an appropriate handler in the onUnknownToken lexer constructor argument.

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
    onUnknownToken: new \Phplrt\Lexer\Config\ThrowErrorHandler(),
);

The default is Phplrt\Lexer\Config\ThrowErrorHandler, however, for example, by specifying Phplrt\Lexer\Config\PassthroughHandler you can return such tokens just like any other.

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
    onUnknownToken: new \Phplrt\Lexer\Config\PassthroughHandler(),
);

foreach ($lexer->lex('42 unknown 23') as $token) {
    echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}

// T_DIGIT with value (42)
// T_UNKNOWN with value ( unknown )
// T_DIGIT with value (23)
// T_EOI with value (\0)

Or you can use a Phplrt\Lexer\Config\NullHandler handler to skip such tokens.

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
    onUnknownToken: new \Phplrt\Lexer\Config\NullHandler(),
);

foreach ($lexer->lex('42 unknown 23') as $token) {
    echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}

// T_DIGIT with value (42)
// T_DIGIT with value (23)
// T_EOI with value (\0)

This behavior can also be overridden using its own handler.

use Phplrt\Contracts\Lexer\TokenInterface;
use Phplrt\Contracts\Source\ReadableInterface;
use Phplrt\Lexer\Config\HandlerInterface;

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
    onUnknownToken: new class implements HandlerInterface {
        public function handle(ReadableInterface $source, TokenInterface $token): ?TokenInterface
        {
            $content = $token->getValue();

            if (\str_starts_with($content, '<?php')) {
                return new \Phplrt\Lexer\Token\Composite('T_PHP_LANGUAGE_INJECTION', ...);
            }

            return $token;
        }
    }
);

foreach ($lexer->lex('...') as $token) { ... }

Renaming

In some cases, it is necessary to rename unknown tokens. To do this, use the unknown constructor argument.

$lexer = new Phplrt\Lexer\Lexer(
    tokens: ['T_DIGIT' => '\d+'],
    onUnknownToken: new \Phplrt\Lexer\Config\PassthroughHandler(),
    unknown: 'WTF_IS_THAT',
);

foreach ($lexer->lex('42 unknown 23') as $token) {
    echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}

// T_DIGIT with value (42)
// WTF_IS_THAT with value ( unknown )
// T_DIGIT with value (23)
// T_EOI with value (\0)