【发布时间】:2021-11-12 07:05:29
【问题描述】:
我有一个要替换为 ' ' 的子字符串列表。最快的方法是什么?这对cython可行吗?将其应用于 100 万行时这真的很慢,所以我正在寻找最快的执行速度。
例子:
df = pd.DataFrame({ "text":
["first text to replace"
, "second text to replace"
, "test this string"
, "this is not the first string"
, "short string test"]
})
removal_list = ["text to replace", "this string"]
一些尝试:
def replace_str(df, col, removal_list):
for item in removal_list:
df[col] = df[col].str.replace(item, ' ')
return df
replace_str(df,'text', removal_list)
def replace_text(text):
miscdict_comp = {re.compile(a): ' ' for a in removal_list}
for pattern, replacement in miscdict_comp.items():
text = pattern.sub(replacement, text)
return text
df['text'] = apply(replace_text)
【问题讨论】:
标签: python pandas string replace substring