我们如何进行情绪分析并在每一行文本旁边创建一个“情绪”记录？答案

【问题标题】：How can we do a sentiment analysis and create a 'sentiment' record next to each line of text?我们如何进行情绪分析并在每一行文本旁边创建一个“情绪”记录？
【发布时间】：2023-03-27 06:52:01
【问题描述】：

我搜索了一些解决方案来进行情绪分析，并将结果写入正在分析的文本列旁边的列中。这就是我想出的。

import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')

# first, we import the relevant modules from the NLTK library
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# next, we initialize VADER so we can use it within our Python script
sid = SentimentIntensityAnalyzer()

# the variable 'message_text' now contains the text we will analyze.
message_text = '''Like you, I am getting very frustrated with this process. I am genuinely trying to be as reasonable as possible. I am not trying to "hold up" the deal at the last minute. I'm afraid that I am being asked to take a fairly large leap of faith after this company (I don't mean the two of you -- I mean Enron) has screwed me and the people who work for me.'''

print(message_text)

# Calling the polarity_scores method on sid and passing in the message_text outputs a dictionary with negative, neutral, positive, and compound scores for the input text
scores = sid.polarity_scores(message_text)

# Here we loop through the keys contained in scores (pos, neu, neg, and compound scores) and print the key-value pairs on the screen
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

这给了我：

compound: -0.3804, neg: 0.093, neu: 0.836, pos: 0.071,

现在，我正在尝试从数据框中输入我自己的文本列。

示例代码来自此站点。

https://programminghistorian.org/en/lessons/sentiment-analysis

我在数据框中有一个由文本组成的字段，如下所示。

These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!
No opening to pee with. weird.  And really tight.  not very comfortable.
I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!
love this cleanser!!
Best facial wipes invented!!!!!!(:

这些是我的数据框中的 5 条单独记录。我正在想办法将每条记录评估为“正面”、“负面”或“中性”，并将每条情绪放在同一行的新字段中。

在本例中，我认为这 5 条记录具有以下 5 种情绪（在每条记录旁边的字段中）：

neutral
negative
positive
positive
positive

我该怎么做？

我想出了一个替代代码示例，如下所示。

event_dictionary ={scores["compound"] >= 0.05 : 'positive', scores["compound"] <= -0.05 : 'negative', scores["compound"] >= -0.05 and scores["compound"] <= 0.05 : 'neutral'} 
#message_text = str(message_text)
for message in message_text:
    scores = sid.polarity_scores(str(message))
    for key in sorted(scores):
        df['sentiment'] = df['body'].map(event_dictionary)

这运行了大约 15 分钟，然后我取消了它，我发现它实际上什么也没做。我想添加一个名为 'sentiment' 的字段，如果 score["compound"] >= 0.05 则使用 'positive' 填充它，如果 score["compound"] = -0.05 and scores["compound"]

【问题讨论】：

这能回答你的问题吗？ Is there a way to do a simple sentiment analysis of a single field in a dataframe?
我的问题与前一个（相同）问题的问题相同：具体来说，您在哪个部分苦苦挣扎？接受的答案只是一个循环和三行代码，其功能由文档给出。

标签： python python-3.x nltk sentiment-analysis

【解决方案1】：

不确定这个数据框是什么样的，但您可以对每个字符串使用 Sentiment Intensity Analyzer 来计算每个消息的极性分数。根据github页面，可以使用“compound”键来计算消息的情绪。

https://github.com/cjhutto/vaderSentiment#about-the-scoring

messages = [
"These brush heads are okay!  Wish they came in a larger diameter, would cover more facial surface area and require less time to do the job!  However, I think they do a better job than just a face cloth in cleansing the pores.  I would recommend this product!",
"No opening to pee with. weird.  And really tight.  not very comfortable.",
"I choose it as spare parts always available and I will buy it again for sure!I will recommend it, without doubt!",
"love this cleanser!!",
"Best facial wipes invented!!!!!!(:"]

for message in messages:
    scores = sid.polarity_scores(message)

    for key in sorted(scores):
        print('{0}: {1} '.format(key, scores[key]), end='')

    if scores["compound"] >= 0.05:
        print("\npositive\n")

    elif scores["compound"] <= -0.05:
        print("\nnegative\n")
    else:
        print("\nneutral\n")

输出：

compound: 0.8713 neg: 0.0 neu: 0.782 pos: 0.218
positive

compound: -0.7021 neg: 0.431 neu: 0.569 pos: 0.0
negative

compound: 0.6362 neg: 0.0 neu: 0.766 pos: 0.234
positive

compound: 0.6988 neg: 0.0 neu: 0.295 pos: 0.705
positive

compound: 0.7482 neg: 0.0 neu: 0.359 pos: 0.641
positive

【讨论】：

实际上，当我仔细观察时，一切都是积极的。这里不对劲。我试过这个：如果 score[key] >= 0.05: 这让一切都是积极的。我试过这个： if scores["compound"] >= 0.05: 这也让一切都变得积极。我可以看到打印声明打印一些“负面”和一些“中性”。我认为逻辑非常接近，但这里肯定有问题。打印出来的不是分配给我的数据框中的“情绪”字段的文本： if scores[key]
df['sentiment'] = 'negative' 我不确定您要做什么，但这会将值 'negative' 分配给整个列。
但是，再一次，一切都被标记为“积极”。我认为有一个简单的解决方法，但我不确定它是什么。这是我第一次使用 Python 涉足 NLP。想法？评论？建议？我对任何人都持开放态度。谢谢。
我刚刚对我的原始帖子做了一个小改动。这能把事情弄清楚吗？现在有意义吗？