TransformerDecoderBlock
public struct TransformerDecoderBlock<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType
Transformer decoder layer consisting of a self attention, encoder attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
-
Undocumented
Declaration
Swift
public var selfAttention: MultiHeadAttention<Element, Device> -
Undocumented
Declaration
Swift
public var encoderAttention: MultiHeadAttention<Element, Device> -
Undocumented
Declaration
Swift
public var pointwiseFeedForward: PointwiseFeedForward<Element, Device> -
Declaration
Swift
public var parameters: [Tensor<Element, Device>] { get } -
Declaration
Swift
public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get } -
Creates Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
Declaration
Swift
public init(hiddenDim: Int, forwardDim: Int, heads: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)Parameters
hiddenDimLast dimension of inputs and outputs
forwardDimSize of value vectors within pointwise feed forward layer
headsNumber of attention heads
keyDimSize of key and query vectors within multi-head attention layer
valueDimSize of value vectors within multi-head attention layer
dropoutDropout rate for dropout applied within self-attention and pointwise feed forward layer
-
Applies multi-head self attention and a pointwise feed forward layer to the inputs
Declaration
Parameters
inputsLayer input with shape [batchSize, maxLen, hiddenSize], encoder outputs with shape [batchSize, maxLen, hiddenSize] and masks broadcastable to [batchSize, heads, queryCount, keyCount] with 1 entries for all elements that should be blocked for encoder and decoder states.
Return Value
Result of layer operations with shape [batchSize, maxLen, hiddenSize]
View on GitHub
TransformerDecoderBlock Structure Reference