# Attention¶

class ashpy.layers.attention.Attention(filters)[source]

Bases: tensorflow.python.keras.engine.training.Model

Attention Layer from Self-Attention GAN [1].

First we extract features from the previous layer:

$f(x) = W_f x$
$g(x) = W_g x$
$h(x) = W_h x$

Then we calculate the importance matrix:

$\beta_{j,i} = \frac{\exp(s_{i,j})}{\sum_{i=1}^{N}\exp(s_{ij})}$

$$\beta_{j,i}$$ indicates the extent to which the model attends to the $$i^{th}$$ location when synthethizing the $$j^{th}$$ region.

Then we calculate the output of the attention layer $$(o_1, ..., o_N) \in \mathbb{R}^{C \times N}$$:

$o_j = \sum_{i=1}^{N} \beta_{j,i} h(x_i)$

Finally we combine the (scaled) attention and the input to get the final output of the layer:

$y_i = \gamma o_i + x_i$

where $$\gamma$$ is initialized as 0.

Examples

• Direct Usage:

x = tf.ones((1, 10, 10, 64))

# instantiate attention layer as model
attention = Attention(64)

# evaluate passing x
output = attention(x)

# the output shape is
# the same as the input shape
print(output.shape)

• Inside a Model:

def MyModel():
inputs = tf.keras.layers.Input(shape=[None, None, 64])
attention = Attention(64)
return tf.keras.Model(inputs=inputs, outputs=attention(inputs))

x = tf.ones((1, 10, 10, 64))
model = MyModel()
output = model(x)

print(output.shape)

(1, 10, 10, 64)

 [1] Self-Attention Generative Adversarial Networks https://arxiv.org/abs/1805.08318

__init__(filters)[source]

Build the Attention Layer.

Parameters: filters (int) – Number of filters of the input tensor. It should be preferably a multiple of 8. None
call(inputs, training=False)[source]

Perform the computation.

Parameters: inputs (tf.Tensor) – Inputs for the computation. training (bool) – Controls for training or evaluation mode. Tensor tf.Tensor – Output Tensor.