无法在熊猫中应用 unicode-escape答案

【问题标题】：Failed to apply unicode-escape in pandas无法在熊猫中应用 unicode-escape
【发布时间】：2021-07-25 12:55:02
【问题描述】：

通过删除字节码中烦人的字符来清理推文数据集（exp : \xf0\x9f\x99\x82）这是不使用函数的代码：

b = data_tweet['Tweet']
b.head()

for i in b:
    x = i.encode('utf=8')
    y = x.decode('unicode-escape')
    print(y)

成功了。字符变成了：ðŸ™„、ðŸ¥°等

但是当我使用函数实现它时，为了将它转换为 csv 文件。它失败了。字节字符保持不变（exp : \xf0\x9f\x99\x82）这是代码：

def convert(text):
    for i in text:
        x = i.encode('utf=8')
        y = x.decode('unicode-escape')
        
    return text

convert(data_tweet['Tweet'])

有人知道为什么吗？

【问题讨论】：

标签： python-3.x pandas unicode-escapes

【解决方案1】：

问题是您实际上没有将结果分配给data_tweet['Tweet']。您可以在系列上使用apply()。

def convert(text):
    x = text.encode('utf=8')
    y = x.decode('unicode-escape')
        
    return y

data_tweet['Tweet'] = data_tweet['Tweet'].apply(convert)

或者

data_tweet['Tweet'] = data_tweet['Tweet'].apply(lambda text: text.encode('utf=8').decode('unicode-escape'))

【讨论】：