Building Chinese Affective Resources in Valence-Arousal Dimensions ( Chinese valence-arousal words (CVAW))
this study builds:
- an affactive lexicon called Chinese valence-arousal words(CVAW) containing 1653 words
- an affective corpus called chinese valence-arousal text (CVAT) containing 2009 sentences extracted from web texts
introduction
-
sentiment analysis : automatically identify affective information from texts,affective states 情感状态 represented using either categorical or dimensional approaches
-
categorical approach:discrete classes such as positive\neutral\negative Ekman’s six basic emotions(Ekman,1992)
-
dimensional approach:continuous numerical values in multiple dimensions such as valuence-arousal(VA) space.对文本进行更智能和细粒度的处理
section 2 现有的情感词汇和语料库 existing affective lexicons and corpora
- SentiWordNet是一种用于意见挖掘的词汇资源,它为WordNet的每个同步集分配三个情绪等级。
- 语言查询和字数(LIWC):计算在广泛的文本范围内人们使用不同类别的单词的程度
- The Chinese LIWC (C-LIWC) dictionary is a Chinese translation of the LIWC with manual revisions to fit the practical characteristics of Chinese usages(Huang et al., 2012)
- Norms for English Words (ANEW) provides 1,034 English words with ratings in the dimensions of pleasure, arousal and dominance (Bradley and Lang, 1999) *唯一提供了VAD三个维度的实值
- Affective Norms for English Text (ANET) (Bradley and Lang, 2007). In addition, only ANET provides VA ratings 用于文本的情感预测
section 3 中国情感资源建设的过程 Chinese affective resource
- CVAW 建立在C-LIWC词典上
- annotators:5
- rate each word in the valence and arousal dimensions using self assessment manikin(SAM) model(Lang,1980)
- 1:negative 9:positive;5代表没有情感倾向的中性词
-
计算不同注释器之间的错误率:
- 用以上模型计算valence和arousal在不同文本上的错误率 arousal的错误率比valence高
section 4 分析结果和可行性评价
- 英语中和汉语中的arousal成分的相关性都要小于情绪效价,再次表明了情绪唤醒维度更难预测
- arousal在cvaw中错误率更高
conclusion
。。。