GloVe: Global Vectors for Word Representation必记

下面是

Relation with skip gram

skip gram:

GloVe: Global Vectors for Word Representation必记
接下来在整个corPus 中训练：

但在vast corpus 难以求所有的 $Q_{i,j}$ ，采用近似

但对于两分布中的交叉熵损失是有弊端的：即低概率高权值
并且上式中的 $Q_{i,j}$ 还是难以normalized,因此

不归一化带来的问题是 $Q_{hat},P_{hat}$ 很大，故采用以下对数形式
GloVe: Global Vectors for Word Representation必记

还是无法优化，因此，不再使用context word （ $X_{i,j}$ ）作为权重,改用 $f(X_{ij})$
GloVe: Global Vectors for Word Representation必记