2019 Interspeech speech emotoin recognition paper reading

2019 Interspeech

1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

实验

2. Self-attention for Speech Emotion Recognition

1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

东京大学
端到端多任务学习with self attention，辅助任务是gender。
首先从语谱图提取特征speech spectrogram，而不是用手工特征。然后CNN-BLSTM E2E网络。随后用self attention mechanism聚焦到情感 salient periods。最后考虑到emotion and gender classification tasks之间的相互特征，结合了性别分类作为附加task，与主要任务emotion classification share有用的信息。
摘要从人机交互应用说明SER has attracted great attention，更有画面感。介绍，分别叙述了特征、语谱图的优越性、HMM GMM SVM等traditional machine learning approaches, CNN RNN traditional machine learning approaches。
multi-headed self attention
提取语谱图：长度归一化到7.5s，不足的补零，长的cut。Hanning windows 800。sampling rate 16000Hz.
短时傅里叶变换
$\alpha$ 和 $\beta$ 是1

实验

IEMOCAP combine EXCITED and HAPPY into HAPPY 四类一共5531samples。
2019 Interspeech speech emotoin recognition paper reading
实验结果对比有5-fold cross-validation（2018），也有leave-one-session-out。

2019 Interspeech

1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

实验

2. Self-attention for Speech Emotion Recognition