Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss

Problem

对于FETC（Fine-grained Entity Type Classification）问题，当前常用的基于距离监督的的方法存在 out-of-context 和 overly-specific 的问题。
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss
比如上图中，对于实体 Steven Kerr，基于距离监督的方法会给出三个 label：{person，athlete，coach}。然而，对于句子 S1，其只和 {person，coach}有关；S2 只和{person，athlete}有关；从 S3 只能推断出 {person}。也就是说，对于一个处于句子中的mention，通常只和一个label-path有关，而基于距离监督的方法会给出所有的label-path。
本文的目标是预测出与 entity mention 相关的那个label-path上的最后一个label。比如对于 S2，预测出 athlete。
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss

Methodology

Context Representation

采用双向LSTM将 $c_i$ 转换成向量 $h_i$ 。令 $H=[h_1, h_2, \cdots, h_T]$ ，最终的 context representation 如下得到：
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss
其中 $w$ 是训练得到的参数向量。

Mention Representation

文章采用了两种方式表示mention：

Averaging encoder

第一种是取mention中所有词向量的均值：
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss

LSTM encoder

为了获得更多的语义信息，在 mention span 的前后各多取一个词，得到 $m_i^*=[w_{p-1}, w_p, \cdots, w_t, w_{t+1}]$ ，将其输入到 LSTM 可以得到 $h_{p-1}, \cdots, h_{t+1}$ , 取 $r_l= h_{t+1}$ 作为 mention 的 LSTM representation .
综上，可以得到特征表示 $R=[r_c, r_a, r_l]$
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss

Optimization

使用一个softmax层来给出各label的概率，然后取最大者。
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss
其中，W可看做是训练得到的 type embeddings, b 是 bias。
交叉熵损失函数为：

其中， $y_i^*$ 是所有 label-path 的叶节点中的最大值。
直观上，如果不能预测到正确的label，那么输出其祖先总比输出其他不相关label好，因此，将估计概率作如下修改：
Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss
其中， $\Gamma$ 是 $\hat y$ 的祖先节点集合。