【问题标题】:Sentiment analysis (AFINN) in RR中的情绪分析(AFINN)
【发布时间】:2018-05-06 14:28:10
【问题描述】:

我正在尝试使用 AFINN 字典 (get_sentiments("afinn") 获取推文数据集的情绪。下面提供了数据集的示例:

A tibble: 10 x 2
   Date                TweetText                                                
   <dttm>              <chr>                                                    
 1 2018-02-10 21:58:19 "RT @RealSirTomJones: Still got the moves! That was a lo~
 2 2018-02-10 21:58:19 "Yass Tom \U0001f600 #snakehips still got it #TheVoiceUK"
 3 2018-02-10 21:58:19 Yasss tom he’s some chanter #TheVoiceUK #ItsNotUnusual   
 4 2018-02-10 21:58:20 #TheVoiceUK SIR TOM JONES...HE'S STILL HOT... AMAZING VO~
 5 2018-02-10 21:58:21 I wonder how many hips Tom Jones has been through? #TheV~
 6 2018-02-10 21:58:21 Tom Jones has still got it!!! #TheVoiceUK                
 7 2018-02-10 21:58:21 Good grief Tom Jones is amazing #TheVoiceuk              
 8 2018-02-10 21:58:21 RT @tonysheps: Sir Thomas Jones you’re a bloody legend #~
 9 2018-02-10 21:58:22 @ITV Tom Jones what a legend!!! ❤️ #StillGotIt #TheVoice~
10 2018-02-10 21:58:22 "RT @RealSirTomJones: Still got the moves! That was a lo~

我想做的是: 1. 将推文拆分为单个单词。 2. 使用 AFINN 词典对这些单词进行评分。 3. 将每条推文的所有单词的得分相加 4. 将此总和返回到新的第三列,以便我可以查看每条推文的得分。

对于类似的词典,我找到了以下代码:

# Initiate the scoreTopic
scoreTopic <- 0
# Start a loop over the documents
for (i in 1:length (myCorpus)) {
  # Store separate words in character vector
  terms <- unlist(strsplit(myCorpus[[i]]$content, " "))
  # Determine the number of positive matches
  pos_matches <- sum(terms %in% positive_words)
  # Determine the number of negative matches
  neg_matches <- sum(terms %in% negative_words)
  # Store the difference in the results vector
  scoreTopic [i] <- pos_matches - neg_matches
} # End of the for loop

dsMyTweets$score <- scoreTopic

但是,我无法调整此代码以使其与 afinn 字典一起使用。

【问题讨论】:

  • 阅读tidytextmining 的第 2 章和第 7 章应该会为您提供所需的所有信息。

标签: r tidyverse sentiment-analysis tidytext lexicon


【解决方案1】:

这将是整理数据原则的一个很好的用例。让我们设置一些示例数据(这些是我的真实推文)。

library(tidytext)
library(tidyverse)

tweets <- tribble(
    ~tweetID, ~TweetText,
    1, "Was Julie helping me because I don't know anything about Python package management? Yes, yes, she was.",
    2, "@darinself OMG, this is my favorite.",
    3, "@treycausey @ftrain THIS IS AMAZING.",
    4, "@nest No, no, not in error. Just the turkey!",
    5, "The @nest people should write a blog post about how many smoke alarms went off yesterday. (I know ours did.)")

现在我们有一些示例数据。在下面的代码中,unnest_tokens() 对文本进行标记,即将其分解为单个单词(tidytext 包允许您对推文使用特殊的标记器),inner_join() 实现情绪分析。

tweet_sentiment <- tweets %>%
    unnest_tokens(word, TweetText, token = "tweets") %>%
    inner_join(get_sentiments("afinn"))
#> Joining, by = "word"

现在我们可以找到每条推文的分数。将推文的原始数据集和left_join() 获取到每条推文的得分sum()。来自 tidyr 的便捷函数 replace_na() 可让您将生成的 NA 值替换为零。

tweets %>%
    left_join(tweet_sentiment %>%
                  group_by(tweetID) %>%
                  summarise(score = sum(score))) %>%
    replace_na(list(score = 0))
#> Joining, by = "tweetID"
#> # A tibble: 5 x 3
#>   tweetID TweetText                                                  score
#>     <dbl> <chr>                                                      <dbl>
#> 1      1. Was Julie helping me because I don't know anything about …    4.
#> 2      2. @darinself OMG, this is my favorite.                          2.
#> 3      3. @treycausey @ftrain THIS IS AMAZING.                          4.
#> 4      4. @nest No, no, not in error. Just the turkey!                 -4.
#> 5      5. The @nest people should write a blog post about how many …    0.

reprex package (v0.2.0) 于 2018 年 5 月 9 日创建。

如果您对情感分析和文本挖掘感兴趣,我邀请您查看extensive documentation and tutorials we have for tidytext

【讨论】:

    【解决方案2】:

    供将来参考:

    Score_word <- function(x) {
      word_bool_vec <- get_sentiments("afinn")$word==x
      score <- get_sentiments("afinn")$score[word_bool_vec]
      return (score) }    
    
    Score_tweet <- function(sentence) {
      words <- unlist(strsplit(sentence, " "))
      words <- as.vector(words)
      scores <- sapply(words, Score_word)
      scores <- unlist(scores)
      Score_tweet <- sum(scores)
      return (Score_tweet)
      }     
    
    dsMyTweets$score<-apply(df, 1, Score_tweet)
    

    这执行了我最初想要的! :)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-05-01
      • 2017-11-06
      • 2013-02-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多