MultiHeadAttention Structure Reference


                    
                    
                    qDense

Matrix multiplied with queries before dot product attention

Declaration

Swift

public var qDense: Tensor<Element, Device>


                    
                    
                    kDense

Matrix multiplied with keys before dot product attention

Declaration

Swift

public var kDense: Tensor<Element, Device>


                    
                    
                    vDense

Matrix multiplied with values before dot product attention

Declaration

Swift

public var vDense: Tensor<Element, Device>

fc

Matrix multiplied with result from dot product attention layer

Declaration

Swift

public var fc: Tensor<Element, Device>


                    
                    
                    attn

Undocumented

Declaration

Swift

public var attn: ScaledDotProductAttention<Element, Device>


                    
                    
                    norm

Undocumented

Declaration

Swift

public var norm: LayerNorm<Element, Device>


                    
                    
                    dropout

Undocumented

Declaration

Swift

public var dropout: Dropout<Element, Device>


                    
                    
                    heads

Number of attention heads

Declaration

Swift

public let heads: Int


                    
                    
                    keyDim

Dimensionality of query and key vectors

Declaration

Swift

public let keyDim: Int


                    
                    
                    valueDim

Dimensionality of value vectors

Declaration

Swift

public let valueDim: Int


                    
                    
                    hiddenDim

Lat dimension of keys, queries and values before matrix multiplication

Declaration

Swift

public let hiddenDim: Int


                    
                    
                    parameters

Declaration

Swift

public var parameters: [Tensor<Element, Device>] { get }


                    
                    
                    parameterPaths

Declaration

Swift

public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }


                    
                    
                    init(heads:hiddenDim:keyDim:valueDim:dropout:)

Multi-Head Attention Layer following Attention Is All You Need.

Declaration

Swift

public init(heads: Int, hiddenDim: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)

Parameters

`heads`	Number of attention heads
`hiddenDim`	Last dimension of keys, queries and values
`keyDim`	Last dimesion of keys
`valueDim`	Intermediate last dimension of values
`dropout`	Dropout rate


                    
                    
                    callAsFunction(_:)

Computes multi-head scaled dot product attention using the provided query, key and value vector as well as the provided mask.

Additionally applies dropout, a residual connection and layer normalization.

Declaration

Swift

public func callAsFunction(_ inputs: (q: Tensor<Element, Device>, k: Tensor<Element, Device>, v: Tensor<Element, Device>, mask: Tensor<Element, Device>?)) -> Tensor<Element, Device>

Parameters


                                inputs

Tuple containing queries of shape [batchSize, queryCount, hiddenDim], keys of shape [batchSize, keyCount, hiddenDim] and values of shape [batchSize, valueCount, hiddenDim] as well as an optional mask that may be used to prevent attention to certain elements outside of the batch or in future timesteps. Mask must be broadcastable to shape [batchSize, heads, queryCount, keyCount] and have 1 entries for all elements that should be blocked.

Return Value

Normalized scaled dot product attended values