mikorovのword2vec実装のLRの更新式メモ


LRによるcontext wordの予測モデル

context word

P(y=1| v_{context}\cdot v_{target}) = \frac{1}{1 + \exp(- v_{context}\cdot v_{target}) }

negative sampling word

P(y=-1 | v_{context}\cdot v_{target}) = 1 - P(y=1 | v_{context}\cdot v_{target})

y=1 => context wordの尤度最大化によるベクトル更新式

\begin{eqnarray}
L &=& \log(P(y=1)  \\
\frac{\partial L}{\partial v_{c}} &=& 
- \partial v_c\log\left((1 + \exp( - v_c \cdot v_t)\right) \\ 
&=& - \frac{1}{1 + \exp(-v_c \cdot v_t) } e^{-v_c \cdot v_t } \times - v_t \\
&=& \frac{\exp(-v_c \cdot v_t) }{1 + \exp(-v_c \cdot v_t) } v_t \\
&=& (1 - P(y=1)) v_t \\
\frac{\partial L}{\partial v_{t}} &=& (1-P(y=1) v_c \\
\end{eqnarray}

y=-1 => negative sampling wordの尤度最大化によるベクトル更新式

\begin{eqnarray}
L &=& \log(P(y=-1) = \log(1 - P(y=1)) \\
\frac{\partial L}{\partial v_{c}} &=& 
\partial v_c\log(1- P(y=1)) \\ 
&=&
\frac{1}{1 - P(y=1)} -1 \times \partial v_c P(y=1) \\
&=& 
 \frac{1 + \exp(-v_c \cdot v_t)}{\exp(-v_c \cdot v_t)}
 -1 * -1 * (1 + \exp(-v_c \cdot v_t))^{-2} \exp(-v_c \cdot v_t) -1 v_t \\
&=& 
 \frac{-1}{1 + \exp(-v_c \cdot v_t)} v_t \\
&=&
 -P(y=1) v_t
\end{eqnarray}

まとめると

  • label = 1 <-> context word
  • label = 0 <-> negative sampling word
\begin{eqnarray}
g = \frac{\partial L}{\partial v_{c}} &=& 
(\mbox{label} - P(y=1)) v_t \\
g = \frac{\partial L}{\partial v_{t}} &=& 
(\mbox{label} - P(y=1)) v_c \\
\end{eqnarray}