原文下载地址

摘要

该算法提出场景:text-based QA,即给定一段文字说明,提出问题,从文字说明中找出相应答案作答。

text-based QA算法的主要步骤包含三个:1)获取可能包含答案的段落;2)候选段落的重排;3)提取信息选择答案

本文的算法主要是解决第一个步骤

算法

算法主要框架:

                             Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

预处理:将文字说明切成一句一句,每句都作为第一步的候选集,设Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记,假设query为q,得分函数为F(q,p),IR系统(即上述三个步骤的第一步)的目标是检索出前k个p,目标函数为:

                                                                                     Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记                             (1)

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记为query q的特征,Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记为候选集p的特征,Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记是由Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记合成的(query, candidate)对的特征:

                                                            Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记              (2)

训练权重向量Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记,使得优化目标为:Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记,转为:Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记(3),这样Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记相当于将query q提取特征后,先进行query expansion再采用点积与获选集计算相似度得分。下边将如何提取特征

特征

特征向量f中的一个项表示为“(KEY = value,weight)”,并且特征向量可以被视为一组这样的元组,写f(KEY = value)= weight表示特征作为关联数组的关键,θX是训练模型中特征X的权重θ。

1.问题特征

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:问题词,如问题是how many,则(QWORD=how many, 1)添加到特征向量中;

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:词汇答案类型(LAT),如果query有问题词:“what”或“which”这个问题的LAT被定义为问题词之后的第一个名词短语(NP)。 例如,“What is the city of brotherly love?”,该元组为(LAT = city,1)

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:所有的命名实体,如:(NE-PERSON=Margaret Thatcher,1)

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:tf-idf ,如Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

2.段落特征(即候选句特征)

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:词袋,段落中任何不同的x都会产生一个特征Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记:命名实体类型。如果段落包含人名,则将生成(NETYPE = PERSON,1特征

特征向量算法

1.合成

首先要实现公式2,对任何的query特征向量fQ(q)= {(ki = vi,wi)},(wi≤1)和Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记,定义两个操作:

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

ki=kj表示ki和kj的值相同。

C定义:

                                                  Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

2.映射

定义:Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

则上式公式(3)中的t(theta)(q)得到表达

Discriminative Information Retrieval for Question Answering Sentence Selection论文笔记

至此,通过(query, candidate)对进行训练获取theta值即可

 

 

 

相关文章: