【发布时间】:2019-08-08 22:09:42
【问题描述】:
在 col_a 中查找目标词和前一个词,并在 col_b_PY 和 col_c_LG 列中追加匹配的字符串
This code i have tried to achive this functionality but not able to
get the expected output. if any help appreciated
Here is the below code i approach with regular expressions:
df[''col_b_PY']=df.col_a.str.contains(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+)
{0,1}PY")
df.col_a.str.extract(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,1}PY",expand=True)
数据框如下所示
col_a
Python PY is a general-purpose language LG
Programming language LG in Python PY
Its easier LG to understand PY
The syntax of the language LG is clean PY
期望的输出:
col_a col_b_PY col_c_LG
Python PY is a general-purpose language LG Python PY language LG
Programming language LG in Python PY Python PY language LG
Its easier LG to understand PY understand PY easier LG
The syntax of the language LG is clean PY clean PY language LG
【问题讨论】:
-
可能是
df['col_b_PY'] = df['col_a'].str.extract(r'([a-zA-Z'-]+\s+PY)\b')和df['col_c_LG'] = df['col_a'].str.extract(r'([a-zA-Z'-]+\s+LG)\b') -
非常感谢! @Wiktor Stribizew 花了很多时间来找出答案
-
我添加了一个带有解释的答案。请注意,
extract需要一个捕获组才能真正提取字符串,它只提取一个 captured 子字符串。 -
Col_aPython PY is a general purpose PY language LG在 col_a 中包含 PY 是两次我需要捕获 python py 和目的 py 我们的正则表达式模式只捕获一次outputPython PY purpose PY -
好的,使用
extractall很容易修复,请参阅我的更新答案。
标签: regex python-3.x pandas