DefaultTokenizer

public struct DefaultTokenizer : Tokenizer

A simple tokenizer which uses a all terminals in a grammar for tokenization.

Terminals may not overlap partially. If two terminals, ab and bc exist and abc is tokenized, the tokenizer will not find an occurrence of the second terminal.

  • Creates a new tokenizer using a Chomsky normalized grammar

    Declaration

    Swift

    public init(grammar: Grammar)

    Parameters

    grammar

    Grammar specifying the rules with which a string should be tokenized.

  • Tokenizes the given word and returns a sequence of possible tokens for each unit of the string

    For a grammar

    A -> a | A B
    B -> a | B b
    

    and a string “ab”

    The tokenizer generates the tokenization

    [[a], [b]]
    

    Throws

    A syntax error if the word could not be tokenized according to rules of the recognized language

    Declaration

    Swift

    public func tokenize(_ word: String) throws -> [[(terminal: Terminal, range: Range<String.Index>)]]

    Parameters

    word

    Word which should be tokenized

    Return Value

    Tokenization of the word