DefaultTokenizer
public struct DefaultTokenizer : Tokenizer
A simple tokenizer which uses a all terminals in a grammar for tokenization.
Terminals may not overlap partially.
If two terminals, ab
and bc
exist and abc
is tokenized,
the tokenizer will not find an occurrence of the second terminal.
-
Creates a new tokenizer using a Chomsky normalized grammar
Declaration
Swift
public init(grammar: Grammar)
Parameters
grammar
Grammar specifying the rules with which a string should be tokenized.
-
Tokenizes the given word and returns a sequence of possible tokens for each unit of the string
For a grammar
A -> a | A B B -> a | B b
and a string “ab”
The tokenizer generates the tokenization
[[a], [b]]
Throws
A syntax error if the word could not be tokenized according to rules of the recognized languageDeclaration
Swift
public func tokenize(_ word: String) throws -> [[(terminal: Terminal, range: Range<String.Index>)]]
Parameters
word
Word which should be tokenized
Return Value
Tokenization of the word