str.replace熊猫数据框中的不平衡括号错误答案

【问题标题】：str.replace unbalanced parenthesis error in pandas dataframestr.replace熊猫数据框中的不平衡括号错误
【发布时间】：2017-05-19 13:10:03
【问题描述】：

我有 2 个数据框。下面是数据的样子

查找

替换

我正在搜索current_title 中的每个keyword，如果找到，我将用Find 数据框中的相应keywordLength 替换它。下面是我的代码。

import pandas as pd
df_find = pd.read_csv(input_path_find)
df_replace = pd.read_csv(input_path_replace)

#replace
for i in range(df_replace.shape[0]):
    df_find.current_title=df_find.current_title.str.replace(df_replace.keyword.loc[i],df_replace.keywordLength.loc[i],case=False)

但是，当我执行代码时，我遇到了错误

error                                     Traceback (most recent call last)
<ipython-input-13-134bbf2a1cb4> in <module>()
      1 for i in range(df_replace.shape[0]):
----> 2     df_find.current_title=df_find.current_title.str.replace(df_replace.keyword.loc[i],df_replace.keywordLength.loc[i],case=False)

c:\python27\lib\site-packages\pandas\core\strings.pyc in replace(self, pat, repl, n, case, flags)
   1504     def replace(self, pat, repl, n=-1, case=True, flags=0):
   1505         result = str_replace(self._data, pat, repl, n=n, case=case,
-> 1506                              flags=flags)
   1507         return self._wrap_result(result)
   1508 

c:\python27\lib\site-packages\pandas\core\strings.pyc in str_replace(arr, pat, repl, n, case, flags)
    326         if not case:
    327             flags |= re.IGNORECASE
--> 328         regex = re.compile(pat, flags=flags)
    329         n = n if n >= 0 else 0
    330 

c:\python27\lib\re.pyc in compile(pattern, flags)
    192 def compile(pattern, flags=0):
    193     "Compile a regular expression pattern, returning a pattern object."
--> 194     return _compile(pattern, flags)
    195 
    196 def purge():

c:\python27\lib\re.pyc in _compile(*key)
    249         p = sre_compile.compile(pattern, flags)
    250     except error, v:
--> 251         raise error, v # invalid expression
    252     if not bypass_cache:
    253         if len(_cache) >= _MAXCACHE:

error: unbalanced parenthesis

有什么帮助吗？

编辑：当str(df_replace.keywordLength.loc[i]) 的值包含任何(*)+[\ 特殊字符时会出现错误

【问题讨论】：

df_replace.keywordLength.loc[i] 中有什么内容？
它在替换表中。第二列。它具有上一列的文本语料库的长度。准确的数字
所以我无法在没有源数据的情况下复制错误，但是，鉴于它是 .str.replace 并且它们是数字，您是否尝试过 str(df_replace.keywordLength.loc[i]) 作为您的替换值？或者.replace，没有.str？
好的，当 str(df_replace.keywordLength.loc[i]) 的值包含任何 (*)+[\ 特殊字符时，会出现错误

标签： python pandas str-replace

【解决方案1】：

str.replace 期望正则表达式作为第一个参数。在将模式字符串传递给str.replace之前，您需要scape：

import pandas as pd
import re
df_find = pd.read_csv(input_path_find)
df_replace = pd.read_csv(input_path_replace)

#replace
for i in range(df_replace.shape[0]):
        df_find.current_title = df_find.current_title.str.replace(
            re.scape(df_replace.keyword.loc[i]),
            df_replace.keywordLength.loc[i],
            case=False
        )

【讨论】：