【论文笔】AS Reader：Text Understanding with the Attention Sum Reader Network

Attention Sum Reader Network

数据集

CNN&DailyMail

每篇文章作为一个文档（document），在文档的summary中剔除一个实体类单词，并作为问题（question），剔除的实体类单词即作为答案（answer），该文档中所有的实体类单词均可为候选答案（candidate answers）。其中每个样本将文本中所有的命名实体用类似“@entity1”替代，并随机打乱表示。

儿童故事（Children’s Book Test，CBT）

从每一个儿童故事中提取20个连续的句子作为文档（document），第21个句子作为问题（question），并从中剔除一个实体类单词作为答案（answer）。

【论文笔】AS Reader：Text Understanding with the Attention Sum Reader Network

模型具体

【论文笔】AS Reader：Text Understanding with the Attention Sum Reader Network

probability si is that the answer to query q appears at position i in the document d.

实验设置

优化函数：Adam
学习率：0.001、0.0005
损失函数：-logP(a|q, d)
embedding层权重矩阵初始化范围：[-0.1, 0.1]
GRU网络中的权值初始化：随机正交矩阵
GRU网络中的偏置初始化：0
batch size：32

实验结果

下图展示了模型对比实验结果。

【论文笔】AS Reader：Text Understanding with the Attention Sum Reader Network

其他相关

这里的pointer sum attention，使用attention as a pointer over discrete tokens in the context document and then they directly sum the word’s attention across all the occurrences.

候选答案词在文档中出现的地方softmax结果累加。

这与seq2seq的attention的使用不同（blend words from the context into an answer representations），这里的attention的使用受到了Pointer Networks(Ptr-Nets)的启发

Attentive and Impatient Readers

比较了与Attentive Reader的区别；

提到了Chen et.al

提到了Memory Networks——MemNNs