【问题标题】:cannot convert emojis into text sentiment from social media comments无法将表情符号转换为社交媒体评论中的文本情感
【发布时间】:2021-05-11 11:51:39
【问题描述】:

我从 facebook 和 twitter cmets 收集了有关产品广告的数据,并尝试对这些 cmets 进行情绪分析。文本清理的一部分涉及将表情符号转换为文本情感,以最大限度地捕获 cmets 中的所有情感。我已经尝试了每行的 emoji.demojize(text) 以及来自 stackoverflow 的各种其他方法,但它们都没有将 cmets 中的表情符号转换为文字中的实际情绪。下面的代码不起作用。不知道我的错误是什么。代码如下:

enter import io
import json

def handleEmojis(text, keep_emoticons = False):
global emoji_sentiment_matching
if not 'emoji_sentiment_matching' in globals():
    with io.open('emoji.json', 'r', encoding = "UTF-8") as outfile:
        emoji_sentiment_matching = json.load(outfile)
HASHTAG_PATTERN = re.compile(r'#\w*')
EMOJIS_PATTERN_PLAIN_TEXT = re.compile(r"(?:X|:|;|=)(?:-)?(?:\)|\(|O|D|P|S){1,}", re.IGNORECASE)
EMOJIS_PATTERN_SYMBOLS = re.compile(u'[\U00002600-\U000027BF]|[\U0001f300-\U0001f64F]|[\U0001f680-\U0001f6FF]')

if keep_emoticons:
    # Replace emoji with sentiment
    for emoji in emoji_sentiment_matching:
        if emoji["emoji"] in text:

            ## Adding space if text follows right away / is right before the emoticon
            idx = text.find(emoji["emoji"])
            (space1,space2) = ("","")
            if (idx-1) >= 0 and text[idx-1] != " ":
                space1 = " "
            if (idx+1) <= len(text) and text[idx+1] != " ":
                space2 = " "

            ## replace emoticon with its sentiment
            text = text.replace(emoji["emoji"], "{}emoji%%{}{}".format(space1, emoji["subgroup"], space2))}
            

## TO IMPLEMENT: Sentiment of other emoticons like :), :-), :-/


else:
    for r in re.findall(EMOJIS_PATTERN_SYMBOLS,text):
        text = text.replace(r, "")
    for r in re.findall(EMOJIS_PATTERN_PLAIN_TEXT,text):
        text = text.replace(r, "")
return text.strip()


import io
import json

FB_df['demojified']=FB_df['Text'] 
for i in range(len(FB_df)):
  text = FB_df.loc[i,"demojified"]
  handleEmojis(text, keep_emoticons = False)

print(FB_df)

这是结果输出(请参阅“demojified”列): dataframe outputs

我也试过下面的代码:

import re
from emot.emo_unicode import UNICODE_EMO, EMOTICONS
from emoji import demojize
def convert_emojis(text):
for emot in UNICODE_EMO:
    text = re.sub(r'('+emot+')', "_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()), text)
return text

将表情符号转换为文字

def convert_emoticons(text):
for emot in EMOTICONS:
    text = re.sub(u'('+emot+')', "_".join(EMOTICONS[emot].replace(",","").split()), text)
    return text

FB_df['demojified']=FB_df['Text'] 

for row in FB_df['demojified']:
for text in row:
    text=text
    convert_emojis(text)

FB_df.loc[:,'demojified']

仍然没有快乐。我已经在这里待了一周。一些指导将不胜感激

我也试过了:

I have also tried: 
import re
from emot.emo_unicode import UNICODE_EMO, EMOTICONS
from emoji import demojize
def convert_emojis(text):
for emot in UNICODE_EMO:
    text = re.sub(r'('+emot+')',
"_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()), text)
return text

将表情符号转换为文字

def convert_emoticons(text):
for emot in EMOTICONS:
    text = re.sub(u'('+emot+')', 
"_".join(EMOTICONS[emot].replace(",","").split()), text)
    return text
FB_df['demojified']=FB_df['Text'] 
for row in FB_df['demojified']:
 for text in row:
    text=str(text)
    text = emoji.demojize(text)

仍然没有喜悦:-(

【问题讨论】:

    标签: python dataframe sentiment-analysis


    【解决方案1】:

    发现问题。我忘记用 for 循环中的输出更新 demojified 列

    FB_df['Demojified']=FB_df['Comments'] 
    for i in range(len(FB_df.Demojified)):
      text = FB_df.loc[i,"Demojified"]
      text=emoji.demojize(text)
      text = text.replace(":"," ")
      text = ' '.join(text.split())
      FB_df.loc[i,"Demojified"] = text
    
    FB_df=FB_df[['Title','Comments','Demojified']]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-07-21
      • 1970-01-01
      • 2020-12-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多