TransformerEncoderBlock
public struct TransformerEncoderBlock<Element, Device> : LayerType, Codable where Element : RandomizableType, Device : DeviceType
Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
-
Undocumented
Declaration
Swift
public var selfAttention: MultiHeadAttention<Element, Device>
-
Undocumented
Declaration
Swift
public var pointwiseFeedForward: PointwiseFeedForward<Element, Device>
-
Declaration
Swift
public var parameters: [Tensor<Element, Device>] { get }
-
Declaration
Swift
public var parameterPaths: [WritableKeyPath<`Self`, Tensor<Element, Device>>] { get }
-
Creates Transformer encoder layer consisting of a self-attention and a pointwise feed forward layer as introduced by Attention Is All You Need.
Declaration
Swift
public init(hiddenDim: Int, forwardDim: Int, heads: Int, keyDim: Int, valueDim: Int, dropout: Float = 0.1)
Parameters
hiddenDim
Last dimension of inputs and outputs
forwardDim
Size of value vectors within pointwise feed forward layer
heads
Number of attention heads
keyDim
Size of key and query vectors within multi-head attention layer
valueDim
Size of value vectors within multi-head attention layer
dropout
Dropout rate for dropout applied within self-attention and pointwise feed forward layer
-
Applies multi-head self attention and a pointwise feed forward layer to the inputs
Declaration
Parameters
inputs
Layer input with shape [batchSize, maxLen, hiddenSize] and padding mask broadcastable to [batchSize, heads, queryCount, keyCount] with 1 entries for all elements that should be blocked.
Return Value
Result of layer operations with shape [batchSize, maxLen, hiddenSize]