【问题标题】:How can we do a sentiment analysis and create a 'sentiment' record next to each line of text?我们如何进行情绪分析并在每一行文本旁边创建一个“情绪”记录?
【发布时间】:2023-03-27 06:52:01
【问题描述】:

我搜索了一些解决方案来进行情绪分析,并将结果写入正在分析的文本列旁边的列中。这就是我想出的。

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')

# first, we import the relevant modules from the NLTK library
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# next, we initialize VADER so we can use it within our Python script
sid = SentimentIntensityAnalyzer()

# the variable 'message_text' now contains the text we will analyze.
message_text = '''Like you, I am getting very frustrated with this process. I am genuinely trying to be as reasonable as possible. I am not trying to "hold up" the deal at the last minute. I'm afraid that I am being asked to take a fairly large leap of faith after this company (I don't mean the two of you -- I mean Enron) has screwed me and the people who work for me.'''

print(message_text)

# Calling the polarity_scores method on sid and passing in the message_text outputs a dictionary with negative, neutral, positive, and compound scores for the input text
scores = sid.polarity_scores(message_text)

# Here we loop through the keys contained in scores (pos, neu, neg, and compound scores) and print the key-value pairs on the screen
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

这给了我:

compound: -0.3804, neg: 0.093, neu: 0.836, pos: 0.071, 

现在,我正在尝试从数据框中输入我自己的文本列。

示例代码来自此站点。

https://programminghistorian.org/en/lessons/sentiment-analysis

我在数据框中有一个由文本组成的字段,如下所示。

These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!
No opening to pee with. weird.  And really tight.  not very comfortable.
I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!
love this cleanser!!
Best facial wipes invented!!!!!!(:

这些是我的数据框中的 5 条单独记录。我正在想办法将每条记录评估为“正面”、“负面”或“中性”,并将每条情绪放在同一行的新字段中。

在本例中,我认为这 5 条记录具有以下 5 种情绪(在每条记录旁边的字段中):

neutral
negative
positive
positive
positive

我该怎么做?

我想出了一个替代代码示例,如下所示。

event_dictionary ={scores["compound"] >= 0.05 : 'positive', scores["compound"] <= -0.05 : 'negative', scores["compound"] >= -0.05 and scores["compound"] <= 0.05 : 'neutral'} 
#message_text = str(message_text)
for message in message_text:
    scores = sid.polarity_scores(str(message))
    for key in sorted(scores):
        df['sentiment'] = df['body'].map(event_dictionary) 

这运行了大约 15 分钟,然后我取消了它,我发现它实际上什么也没做。我想添加一个名为 'sentiment' 的字段,如果 score["compound"] >= 0.05 则使用 'positive' 填充它,如果 score["compound"] = -0.05 and scores["compound"]

【问题讨论】:

标签: python python-3.x nltk sentiment-analysis


【解决方案1】:

不确定这个数据框是什么样的,但您可以对每个字符串使用 Sentiment Intensity Analyzer 来计算每个消息的极性分数。根据github页面,可以使用“compound”键来计算消息的情绪。

https://github.com/cjhutto/vaderSentiment#about-the-scoring

messages = [
"These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!",
"No opening to pee with. weird.  And really tight.  not very comfortable.",
"I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!",
"love this cleanser!!",
"Best facial wipes invented!!!!!!(:"]

for message in messages:
    scores = sid.polarity_scores(message)

    for key in sorted(scores):
        print('{0}: {1} '.format(key, scores[key]), end='')

    if scores["compound"] >= 0.05:
        print("\npositive\n")

    elif scores["compound"] <= -0.05:
        print("\nnegative\n")
    else:
        print("\nneutral\n")

输出:

compound: 0.8713 neg: 0.0 neu: 0.782 pos: 0.218
positive

compound: -0.7021 neg: 0.431 neu: 0.569 pos: 0.0
negative

compound: 0.6362 neg: 0.0 neu: 0.766 pos: 0.234
positive

compound: 0.6988 neg: 0.0 neu: 0.295 pos: 0.705
positive

compound: 0.7482 neg: 0.0 neu: 0.359 pos: 0.641
positive

【讨论】:

  • 实际上,当我仔细观察时,一切都是积极的。这里不对劲。我试过这个:如果 score[key] >= 0.05: 这让一切都是积极的。我试过这个: if scores["compound"] >= 0.05: 这也让一切都变得积极。我可以看到打印声明打印一些“负面”和一些“中性”。我认为逻辑非常接近,但这里肯定有问题。打印出来的不是分配给我的数据框中的“情绪”字段的文本: if scores[key]
  • df['sentiment'] = 'negative' 我不确定您要做什么,但这会将值 'negative' 分配给整个列。
  • 但是,再一次,一切都被标记为“积极”。我认为有一个简单的解决方法,但我不确定它是什么。这是我第一次使用 Python 涉足 NLP。想法?评论?建议?我对任何人都持开放态度。谢谢。
  • 我刚刚对我的原始帖子做了一个小改动。这能把事情弄清楚吗?现在有意义吗?
猜你喜欢
  • 2017-11-06
  • 1970-01-01
  • 2015-09-23
  • 2012-05-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多