【发布时间】:2021-05-11 11:51:39
【问题描述】:
我从 facebook 和 twitter cmets 收集了有关产品广告的数据,并尝试对这些 cmets 进行情绪分析。文本清理的一部分涉及将表情符号转换为文本情感,以最大限度地捕获 cmets 中的所有情感。我已经尝试了每行的 emoji.demojize(text) 以及来自 stackoverflow 的各种其他方法,但它们都没有将 cmets 中的表情符号转换为文字中的实际情绪。下面的代码不起作用。不知道我的错误是什么。代码如下:
enter import io
import json
def handleEmojis(text, keep_emoticons = False):
global emoji_sentiment_matching
if not 'emoji_sentiment_matching' in globals():
with io.open('emoji.json', 'r', encoding = "UTF-8") as outfile:
emoji_sentiment_matching = json.load(outfile)
HASHTAG_PATTERN = re.compile(r'#\w*')
EMOJIS_PATTERN_PLAIN_TEXT = re.compile(r"(?:X|:|;|=)(?:-)?(?:\)|\(|O|D|P|S){1,}", re.IGNORECASE)
EMOJIS_PATTERN_SYMBOLS = re.compile(u'[\U00002600-\U000027BF]|[\U0001f300-\U0001f64F]|[\U0001f680-\U0001f6FF]')
if keep_emoticons:
# Replace emoji with sentiment
for emoji in emoji_sentiment_matching:
if emoji["emoji"] in text:
## Adding space if text follows right away / is right before the emoticon
idx = text.find(emoji["emoji"])
(space1,space2) = ("","")
if (idx-1) >= 0 and text[idx-1] != " ":
space1 = " "
if (idx+1) <= len(text) and text[idx+1] != " ":
space2 = " "
## replace emoticon with its sentiment
text = text.replace(emoji["emoji"], "{}emoji%%{}{}".format(space1, emoji["subgroup"], space2))}
## TO IMPLEMENT: Sentiment of other emoticons like :), :-), :-/
else:
for r in re.findall(EMOJIS_PATTERN_SYMBOLS,text):
text = text.replace(r, "")
for r in re.findall(EMOJIS_PATTERN_PLAIN_TEXT,text):
text = text.replace(r, "")
return text.strip()
import io
import json
FB_df['demojified']=FB_df['Text']
for i in range(len(FB_df)):
text = FB_df.loc[i,"demojified"]
handleEmojis(text, keep_emoticons = False)
print(FB_df)
这是结果输出(请参阅“demojified”列): dataframe outputs
我也试过下面的代码:
import re
from emot.emo_unicode import UNICODE_EMO, EMOTICONS
from emoji import demojize
def convert_emojis(text):
for emot in UNICODE_EMO:
text = re.sub(r'('+emot+')', "_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()), text)
return text
将表情符号转换为文字
def convert_emoticons(text):
for emot in EMOTICONS:
text = re.sub(u'('+emot+')', "_".join(EMOTICONS[emot].replace(",","").split()), text)
return text
FB_df['demojified']=FB_df['Text']
for row in FB_df['demojified']:
for text in row:
text=text
convert_emojis(text)
FB_df.loc[:,'demojified']
仍然没有快乐。我已经在这里待了一周。一些指导将不胜感激
我也试过了:
I have also tried:
import re
from emot.emo_unicode import UNICODE_EMO, EMOTICONS
from emoji import demojize
def convert_emojis(text):
for emot in UNICODE_EMO:
text = re.sub(r'('+emot+')',
"_".join(UNICODE_EMO[emot].replace(",","").replace(":","").split()), text)
return text
将表情符号转换为文字
def convert_emoticons(text):
for emot in EMOTICONS:
text = re.sub(u'('+emot+')',
"_".join(EMOTICONS[emot].replace(",","").split()), text)
return text
FB_df['demojified']=FB_df['Text']
for row in FB_df['demojified']:
for text in row:
text=str(text)
text = emoji.demojize(text)
仍然没有喜悦:-(
【问题讨论】:
标签: python dataframe sentiment-analysis