A Dive Into Multihead Attention, Self-Attention and Cross-Attention
22.3 هزار بار بازدید -
پارسال
-
In this video, I will
In this video, I will first give a recap of Scaled Dot-Product Attention, and then dive into Multihead Attention. After that, we will see two different ways of using the attention mechanism, which is Self-Attention and Cross-Attention.
Solution of the exercise:
We have
X: T1xd
Y: T2xd
So, we build Q from Y, so that means Q will be
Q: T2xd
And we build K and V from X, therefore,
K: T1xd
V: T1xd
Then, QK^t (compatibility matrix) will be
QK^t: T2xT1
And the final output Z, will be Softmax(1/sqrt(d) QK^t) * V
Z: T2xd
Solution of the exercise:
We have
X: T1xd
Y: T2xd
So, we build Q from Y, so that means Q will be
Q: T2xd
And we build K and V from X, therefore,
K: T1xd
V: T1xd
Then, QK^t (compatibility matrix) will be
QK^t: T2xT1
And the final output Z, will be Softmax(1/sqrt(d) QK^t) * V
Z: T2xd
پارسال
در تاریخ 1402/01/27 منتشر شده
است.
22,305
بـار بازدید شده