Unknown Token
The Unknown token denotes any unknown tokens. By default, at such moments, lexing stops with a lexing error.
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
);
foreach ($lexer->lex('42 unknown') as $token) {
echo $token->getName() . "\n";
}
// T_DIGIT
// Uncaught Phplrt\Lexer\Exception\UnrecognizedTokenException: Syntax error, unrecognized " unknown"
Behaviour Change
You can override this behavior by specifying an appropriate handler in the
onUnknownToken
lexer constructor argument.
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
onUnknownToken: new \Phplrt\Lexer\Config\ThrowErrorHandler(),
);
The default is Phplrt\Lexer\Config\ThrowErrorHandler
, however, for example, by
specifying Phplrt\Lexer\Config\PassthroughHandler
you can return such tokens
just like any other.
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
onUnknownToken: new \Phplrt\Lexer\Config\PassthroughHandler(),
);
foreach ($lexer->lex('42 unknown 23') as $token) {
echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}
// T_DIGIT with value (42)
// T_UNKNOWN with value ( unknown )
// T_DIGIT with value (23)
// T_EOI with value (\0)
Or you can use a Phplrt\Lexer\Config\NullHandler
handler to skip such tokens.
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
onUnknownToken: new \Phplrt\Lexer\Config\NullHandler(),
);
foreach ($lexer->lex('42 unknown 23') as $token) {
echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}
// T_DIGIT with value (42)
// T_DIGIT with value (23)
// T_EOI with value (\0)
This behavior can also be overridden using its own handler.
use Phplrt\Contracts\Lexer\TokenInterface;
use Phplrt\Contracts\Source\ReadableInterface;
use Phplrt\Lexer\Config\HandlerInterface;
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
onUnknownToken: new class implements HandlerInterface {
public function handle(ReadableInterface $source, TokenInterface $token): ?TokenInterface
{
$content = $token->getValue();
if (\str_starts_with($content, '<?php')) {
return new \Phplrt\Lexer\Token\Composite('T_PHP_LANGUAGE_INJECTION', ...);
}
return $token;
}
}
);
foreach ($lexer->lex('...') as $token) { ... }
Renaming
In some cases, it is necessary to rename unknown tokens. To do this, use
the unknown
constructor argument.
$lexer = new Phplrt\Lexer\Lexer(
tokens: ['T_DIGIT' => '\d+'],
onUnknownToken: new \Phplrt\Lexer\Config\PassthroughHandler(),
unknown: 'WTF_IS_THAT',
);
foreach ($lexer->lex('42 unknown 23') as $token) {
echo $token->getName() . ' with value (' . $token->getValue() . ")\n";
}
// T_DIGIT with value (42)
// WTF_IS_THAT with value ( unknown )
// T_DIGIT with value (23)
// T_EOI with value (\0)