【论文笔记】Improving Transformer-based End-to-End Speech Recognition with CTC and LM Integration

题目

Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration

链接

http://www.isca-speech.org/archive/Interspeech_2019/abstracts/1938.html

Contributions

将CTC，LM与Transformer在decoding阶段融合，实现更好的识别表现
通过实验调查了融合效果在大数据集上的表现
在开源项目ESPnet上实现基于Transformer的ASR toolkit

亮点与启发

文章指出，Transformer应用于ASR主要有两个问题：

相比于 RNN-based ASR，收敛速度较慢。
不易与语言模型结合

而通过与CTC在解码阶段的结合，可以加速Transformer的收敛速度。

CTC can encourage monotonic alignment between the speech and transcription. Therefore, the attention at an early epoch with CTC appears more monotonic than that without CTC

CTC的自动对齐功能使语音片段与文字序列保持单调一致的对齐，在训练的初始阶段使attention的更加单调集中

【论文笔记】Improving Transformer-based End-to-End Speech Recognition with CTC and LM Integration

提出结合后的损失函数：

实验结果

【论文笔记】Improving Transformer-based End-to-End Speech Recognition with CTC and LM Integration
由实验结果可见：
融合了CTC与LM之后，模型表现效果有所提升。

持续记录关于端到端语音识别论文与资料：
https://github.com/zyascend/End-to-End-Speech-Recognition-Learning

题目

链接

标签

Contributions

亮点与启发

实验结果