输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

很显然，这篇论文提出的是输入法的评价标准。

这篇论文研究的问题： We describe a closed-loop, smart touch keyboard (STK) evaluation system that we have im- plemented to solve this problem. 即，关于输入法的评价体系

问题所包含的具体内容：
包括四个方面：模拟用户带噪声的输入，解码算法，语言模型，个性化语言模型

实验结果：

Using the Enron email corpus as a person-alization test set, we show for the first time at this scale that a combined spatial/language model reduces word error rate froma pre-model baseline of 38.4% down to 5.7%, and thatLM personalization can improve this further to 4.6%.

LM personalization -----> a combined spatial/language model ----> a pre-model baseline

4.6% 5.7% 38.4%

相关工作：
1. Language Model For Text Entry ：

==> The simplest LM is a lexicon, i.e. a list of permissible words.

==> Roman letter-based phonetic input to logographic characters

==> Tanaka-Ishii

TODOstudy 4 LMs : unigram, co-occurrence, MTF (move to front, which gives higher priority to recent words), and PPM (prediction by partial match)

在这里简单介绍一下PPM： PPM 在预测下一个词的能力强于其他的语言模型，PPM采用的方法是将 unigram与n-gram Model 作一个线性插值，与数据有很大的关系，比如用户语料的数量从0增长到50,000单词时，PPM 预测下一个词的平均rank 从1.3 下降到 1.09，这个衡量指标越低越好。
2. Language Model Adaptation ：

Cache-based LM 的两个特点： 1）updated on the fly 2）user cache may begin in a completely empty state

Data :

Bi-gram : Katz-smoothed [16] bigram LM trained on114 billion words scraped from the publicly- accessible web in English

We pruned the model size using entropy pruning [29], a technique for decreasing the size of abackoff LM

Katz's back-off model : https://en.wikipedia.org/wiki/Katz%27s_back-off_model#cite_note-2

Personal dataSet: 90 Enron users with 1500 total words

模拟智能手机的输入： Simulated Smart Touch Keyboard Typing

所作的简化假设： 1) 字符限定在 a-z 2) 空格永远正确地输入 3） intended 错误不予考虑，集中考虑的错误为 2D spatial noise

简单的模型解码： A simple model decoder

Sptial Score
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

与Language Model 的结合：
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

Tap sequences to possible words:
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

产生候选的Candidate后，通过计算 B Score的得分，可以得出 best word

评价标准的选取
1. 不选取 perplexity 作为标准的原因是：1）即便是perplexity的巨大提升也不一定必然导致其他的task也显著提升，特别是在语音识别这一领域，文章中给出文献：TODO Evaluation metrics for language models就提到 perplexity 与语音识别的单词错误率并未形成很强的关联，在语音识别领域，采用的评价指标是 WER is the dominant evaluation metric

2 ）perplexity 是典型的 lexicon 相关

2. extrinsic evalution metrics : keystroke saving and TODO WER

模拟纠错： SimulatingWord Correction

模拟预测：SimulatingWord Prediction

saving keystrokes ：节省的按键，每预测对一个单词，节省了按键 (论文中将选词算作一次KSR) 。预测没有加入任何的spatial 信息，都是保证前面的词语正确输入。计算方式如下所示：
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

个性化语言模型： LANGUAGE MODEL PERSONALIZATION

1） the uniform cache ：
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

2） exponentially-decaying :
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

在词语纠错，是用到了语言模型的。这里也比较好的验证了加入语言模型，特别是加入了个性化的语言模型之后，在word correction 这个任务上性能显著提升。

Result :

Word Correct:
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

Word Prediction:
输入法论文阅读一：Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

论文之外所作的对比试验，在（n-gram 模型 + lstm + cnn-word-prediction）的试验结果：采用lstm的模型节省的ksr明显优于n-gram，（在新闻语料上的结果）

总结一下，这篇文章所作的工作：

1）文章中自称训练出了一个 the state of art 的LM，在 web-scale data 数据上，采用 entropy pruning的方法

2）模拟用户输入，采用 2D Guassion spatial model 来模仿人类的噪声，这里的错误只考虑来自于 spatial的

3）提供了一个simple但effective的decoder模型（LM + spatial model），可以在一个集成闭环内对模型性能进行评估

4）利用 Enron Corpus 数据集，对LM personalization 进行建模