TransformerDecoderBlock

public struct TransformerDecoderBlock<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType

Transformer decoder layer consisting of a self attention, encoder attention and a pointwise feed forward layer as introduced by Attention Is All You Need.

  • Undocumented

    Declaration

    Swift

    public var selfAttention: MultiHeadAttention<Element, Device>
  • Undocumented

    Declaration

    Swift

    public var encoderAttention: MultiHeadAttention<Element, Device>
  • Undocumented

    Declaration

    Swift

    public var pointwiseFeedForward: PointwiseFeedForward<Element, Device>
  • Declaration

    Swift

    public var parameters: [Tensor<Element, Device>] { get }
  • Declaration

    Swift

    public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }
  • Creates Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.

    Declaration

    Swift

    public init(hiddenDim: Int, forwardDim: Int, heads: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)

    Parameters

    hiddenDim

    Last dimension of inputs and outputs

    forwardDim

    Size of value vectors within pointwise feed forward layer

    heads

    Number of attention heads

    keyDim

    Size of key and query vectors within multi-head attention layer

    valueDim

    Size of value vectors within multi-head attention layer

    dropout

    Dropout rate for dropout applied within self-attention and pointwise feed forward layer

  • Applies multi-head self attention and a pointwise feed forward layer to the inputs

    Declaration

    Swift

    public func callAsFunction(_ inputs: (decoderInput: Tensor<Element, Device>, encoderOutput: Tensor<Element, Device>, encoderMask: Tensor<Element, Device>, decoderMask: Tensor<Element, Device>)) -> Tensor<Element, Device>

    Parameters

    inputs

    Layer input with shape [batchSize, maxLen, hiddenSize], encoder outputs with shape [batchSize, maxLen, hiddenSize] and masks broadcastable to [batchSize, heads, queryCount, keyCount] with 1 entries for all elements that should be blocked for encoder and decoder states.

    Return Value

    Result of layer operations with shape [batchSize, maxLen, hiddenSize]