论文阅读-RankME: Reliable Human Ratings for Natural Language Generation

人工评价仍然是NLG任务主流的评价方式。本文旨在提高人工评价质量。
CrowdFlower 的代码，但是感觉都是前端页面的代码。。

相关方法

名称	全称	释义
Likert		李克特量表
ME	magnitude estimation	量值估计
plan ME	plain magnitude estimation	简单量值估计
RankME	rank-based magnitude estimation	基于排序的量值估计

ME是在这篇论文中介绍的（看3.1那一节），原文如下：

Rather than giving participants a fixed scale, we used the magnitude estimation paradigm, which is more suitable to capture robust or subtle differences between the relative strength of acceptability or grammaticality violations

ME 过程是使用了拉丁方设计，让每一个人给句子打分（分数只有大于0就行），同一个人的打分再进行一个归一化到0-1之间.

RankME

RankME则是让每一个人对所有的候选句子进行一个relative ranking（RR）。但是怎么做relative ranking并没有之间提到，但是论文说该方法综合了 ContinuseScale(CS)、MagnitudeEstimation(ME)、Relative
Assessment

和relative ranking最近的就是最后一篇论文，而在那篇论文中，RR的过程就是把候选句子按照句子质量，由好到坏排一下。

但是在这里，他给了ME的打分准则，我在原文中是没有看到的。
论文阅读-RankME: Reliable Human Ratings for Natural Language Generation