用用户词典和其他人替换特定单词为0答案

【问题标题】：Replace specific words by user dictionary and others by 0用用户词典和其他人替换特定单词为0
【发布时间】：2019-05-10 01:48:15
【问题描述】：

所以我有一个评论数据集，其中包含类似

的评论

简直是最好的。我去年买了这个。还在用。没问题面对迄今为止。惊人的电池寿命。在黑暗或广阔的环境中工作正常白天。送给任何书迷的最佳礼物。

（这是来自原始数据集，我已删除所有标点符号并在我处理的数据集中全部小写）

我想要做的是将一些单词替换为 1（根据我的字典），将其他单词替换为 0。我的字典是

dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

我希望我的输出如下：

0010000000000001000000000100000

我用过这段代码：

df['newreviews'] = df['reviews'].map(dict).fillna("0")

这总是返回 0 作为输出。我不想这样，所以我将 1 和 0 作为字符串，但尽管如此，我得到了相同的结果。有什么建议可以解决这个问题吗？

【问题讨论】：

您没有在任何地方拆分字符串以使此映射正常工作，您还应该使用 dict 作为变量名，因为它掩盖了 python 的内置 dict 类型。
@AChampion 如何拆分字符串以使地图工作？
发布您的df['reviews']的可测试片段
您可能想要执行以下操作：df.reviews.str.split().apply(lambda review: ''.join(d.get(word, '0') for word in review)) 假设您已经降低并删除了所有标点符号（并将 dict 重命名为 d）。

标签： python python-3.x pandas dictionary dataframe

【解决方案1】：

首先不要使用dict作为变量名，因为内置（python保留字），然后使用list comprehension和get将不匹配的值替换为0。

通知：

如果数据类似于date.Amazing - 标点符号后不需要空格替换为空格。

df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})

d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}

df['reviews']  = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()

df['newreviews'] = [''.join(d.get(y, '0')  for y in x.split()) for x in df['reviews']]

替代方案：

df['newreviews'] =  df['reviews'].apply(lambda x: ''.join(d.get(y, '0')  for y in x.split()))

print (df)
                                             reviews  \
0  simply the best  i bought this last year  stil...   

                        newreviews  
0  0011000000000001000000000100000

【讨论】：

注意：OP声称已经降低和删除了标点符号，所以你可能做的太多了:)。你也错过了'Amazing'，因为标点符号周围没有空格-'... date.Amazing ...'
@AChampion - 谢谢，解决方案应该是用空格替换标点符号。

【解决方案2】：

你可以这样做

df.replace(repl, regex=True, inplace=True)

df 是您的数据框，repl 是您的字典。

【讨论】：

【解决方案3】：

你可以这样做：

# clean the sentence
import re
sent = re.sub(r'\.','',sent)

# convert to list
sent = sent.lower().split()

# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)

'001100000000000000000000100000'

【讨论】：