TransformerEncoderBlock

public struct TransformerEncoderBlock<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType

Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.


                    
                    
                    selfAttention

Undocumented

Declaration

Swift

public var selfAttention: MultiHeadAttention<Element, Device>


                    
                    
                    pointwiseFeedForward

Undocumented

Declaration

Swift

public var pointwiseFeedForward: PointwiseFeedForward<Element, Device>


                    
                    
                    parameters

Declaration

Swift

public var parameters: [Tensor<Element, Device>] { get }


                    
                    
                    parameterPaths

Declaration

Swift

public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }


                    
                    
                    init(hiddenDim:forwardDim:heads:keyDim:valueDim:dropout:)

Creates Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.

Declaration

Swift

public init(hiddenDim: Int, forwardDim: Int, heads: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)

Parameters

`hiddenDim`	Last dimension of inputs and outputs
`forwardDim`	Size of value vectors within pointwise feed forward layer
`heads`	Number of attention heads
`keyDim`	Size of key and query vectors within multi-head attention layer
`valueDim`	Size of value vectors within multi-head attention layer
`dropout`	Dropout rate for dropout applied within self-attention and pointwise feed forward layer


                    
                    
                    callAsFunction(_:)

Applies multi-head self attention and a pointwise feed forward layer to the inputs

Declaration

Swift

public func callAsFunction(_ inputs: (inputs: Tensor<Element, Device>, mask: Tensor<Element, Device>)) -> Tensor<Element, Device>

Parameters


                                inputs

Layer input with shape [batchSize, maxLen, hiddenSize] and padding mask broadcastable to [batchSize, heads, queryCount, keyCount] with 1 entries for all elements that should be blocked.

Return Value

Result of layer operations with shape [batchSize, maxLen, hiddenSize]