TransformerDecoderBlock
public struct TransformerDecoderBlock<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType
Transformer decoder layer consisting of a self attention, encoder attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
-
Undocumented
Declaration
Swift
public var selfAttention: MultiHeadAttention<Element, Device>
-
Undocumented
Declaration
Swift
public var encoderAttention: MultiHeadAttention<Element, Device>
-
Undocumented
Declaration
Swift
public var pointwiseFeedForward: PointwiseFeedForward<Element, Device>
-
Declaration
Swift
public var parameters: [Tensor<Element, Device>] { get }
-
Declaration
Swift
public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }
-
Creates Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
Declaration
Swift
public init(hiddenDim: Int, forwardDim: Int, heads: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)
Parameters
hiddenDim
Last dimension of inputs and outputs
forwardDim
Size of value vectors within pointwise feed forward layer
heads
Number of attention heads
keyDim
Size of key and query vectors within multi-head attention layer
valueDim
Size of value vectors within multi-head attention layer
dropout
Dropout rate for dropout applied within self-attention and pointwise feed forward layer
-
Applies multi-head self attention and a pointwise feed forward layer to the inputs
Declaration
Parameters
inputs
Layer input with shape [batchSize, maxLen, hiddenSize], encoder outputs with shape [batchSize, maxLen, hiddenSize] and masks broadcastable to [batchSize, heads, queryCount, keyCount] with 1 entries for all elements that should be blocked for encoder and decoder states.
Return Value
Result of layer operations with shape [batchSize, maxLen, hiddenSize]