1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

  1. 东京大学
  2. 端到端多任务学习with self attention,辅助任务是gender。
    首先从语谱图提取特征speech spectrogram,而不是用手工特征。然后CNN-BLSTM E2E网络。随后用self attention mechanism聚焦到情感 salient periods。最后考虑到emotion and gender classification tasks之间的相互特征,结合了性别分类作为附加task,与主要任务emotion classification share有用的信息。
  3. 摘要从人机交互应用说明SER has attracted great attention,更有画面感。介绍,分别叙述了特征、语谱图的优越性 、HMM GMM SVM等traditional machine learning approaches, CNN RNN traditional machine learning approaches。
  4. multi-headed self attention
  5. 提取语谱图:长度归一化到7.5s,不足的补零,长的cut。Hanning windows 800。sampling rate 16000Hz.
    短时傅里叶变换
  6. α\alphaβ\beta 是1
    2019 Interspeech speech emotoin recognition paper reading

实验

IEMOCAP combine EXCITED and HAPPY into HAPPY 四类 一共5531samples。
2019 Interspeech speech emotoin recognition paper reading
实验结果对比有5-fold cross-validation(2018),也有leave-one-session-out。

2. Self-attention for Speech Emotion Recognition

相关文章: