pandas df.apply() 不适用于 html.unescape()答案

【问题标题】：pandas df.apply() not working with html.unescape()pandas df.apply() 不适用于 html.unescape()
【发布时间】：2022-01-02 03:38:56
【问题描述】：

我正在尝试解码熊猫数据框中的 html 字符。我不知道为什么，但是我的应用功能不起作用。

# requirements
import html
import pandas as pd

# This code works fine.
df = df.apply(lambda x: x + "TESTSTRING")
print(df) # "TESTSTRING" is appended to all values.

# This code also works fine. html.unescape() is working well.
fn = lambda x: html.unescape(x)
str = "Someting wrong with <b>E&amp;S</b>"
print(fn(str)) # returns "Something wrong with <b>E&S</b>"

# However, the code below doesn't work. The "&amp;" within the values dont' get decoded.
df2 = df.apply(fn)
print(df2) # The html characters aren't decoded!

apply 函数和 html.unescape() 分开运行很好，不知道为什么在一起就不行。我也试过axis=1

非常感谢您的帮助。提前致谢。

【问题讨论】：

标签： python html pandas

【解决方案1】：

问题是html.unexcape() 似乎未矢量化，即它只接受一个字符串。如果您的 df 不是很大，使用 applymap 应该仍然足够快：

df2 = df.applymap(lambda x: html.unescape(x))
print(df2)

【讨论】：

非常感谢，我不熟悉“矢量化”的概念，但我会谷歌并弄清楚。