A Dive Into Multihead Attention, Self-Attention and Cross-Attention

Machine Learning Studio
Machine Learning Studio
22.3 هزار بار بازدید - پارسال - In this video, I will
In this video, I will first give a recap of Scaled Dot-Product Attention, and then dive into Multihead Attention. After that, we will see two different ways of using the attention mechanism, which is Self-Attention and Cross-Attention.


Solution of the exercise:
We have
X: T1xd
Y: T2xd

So, we build Q from Y, so that means Q will be
Q: T2xd

And we build K and V from X, therefore,
K: T1xd
V: T1xd

Then, QK^t (compatibility matrix) will be
QK^t: T2xT1

And the final output Z, will be Softmax(1/sqrt(d) QK^t) * V
Z: T2xd
پارسال در تاریخ 1402/01/27 منتشر شده است.
22,305 بـار بازدید شده
... بیشتر