Transformer
public struct Transformer<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType
Transformer as introduced by Attention Is All You Need.
The transformer model shares an embedding matrix between the encoder and decoder and reuses the embedding weights to compute the decoder output distribution. Outputs of the transformer are normalized using log softmax.
-
Declaration
Swift
public typealias Outputs = Tensor<Element, Device>
-
Undocumented
Declaration
Swift
public var embedding: Embedding<Element, Device>
-
Undocumented
Declaration
Swift
public var positionalEncoding: PositionalEncoding<Element, Device>
-
Undocumented
Declaration
Swift
public var dropout: Dropout<Element, Device>
-
Undocumented
Declaration
Swift
public var encoder: TransformerEncoder<Element, Device>
-
Undocumented
Declaration
Swift
public var decoder: TransformerDecoder<Element, Device>
-
Undocumented
Declaration
Swift
public var outputBias: Tensor<Element, Device>
-
Declaration
Swift
public var parameters: [Tensor<Element, Device>] { get }
-
Declaration
Swift
public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }
-
Creates a new transformer, which follows Attention Is All You Need.
Declaration
Swift
public init(encoderLayers: Int, decoderLayers: Int, vocabSize: Int, hiddenDim: Int, heads: Int, keyDim: Int, valueDim: Int, forwardDim: Int, dropout: Float = 0.1)
Parameters
encoderLayers
Number of encoder layers
decoderLayers
Number of decoder layers
vocabSize
Number of tokens in the vocabulary of the transformer
hiddenDim
Size of transformer layer outputs
heads
Number of attention heads in multi-head attention layers
keyDim
Size of key vectors in multi-head attention layers
valueDim
Size of value vectors in muti-head attention layers
forwardDim
Size of activations in poitnwise feed forward layers
dropout
Dropout rate
-
Computes the outputs of the decoder given the inputs for the encoder and decoder.
Declaration
Parameters
inputs
Tuple containing: - Padded encoder inputs using -1 as a padding token. - Padded decoder inputs using -1 as padding token.
Return Value
Batch of sequences of log-softmax normalized distributions over the vocabulary of the transformer with shape [batchSize, seqlen, vocabDim]
-
Greedily decodes the most probable sequence of output symbols given a sequence of input tokens
Declaration
Swift
public func callAsFunction(inputSequence: [Int32], startToken: Int32, endToken: Int32, maxLength: Int) -> [Int32]
Parameters
inputSequence
Input tokens
startToken
First token to feed into the decoder. Subsequent tokens are generated autoregressively.
endToken
Token, which ends decoding (end of sequence marker)
maxLength
Maximum length of the decoded sequence. If no endToken occurs after maxLength tokens, decoding is aborted.
Return Value
Most probable output sequence determined by greedy decoding.