【发布时间】:2021-01-28 09:28:46
【问题描述】:
我正在尝试构建一个函数,用正则表达式替换数据框的日期列。
# import regex
import re
# create a copy of data
data2 = data
loop = len(data2) - data['Date of Publication'].isna().sum()
for i in range (loop):
if (pd.notna(data2.loc[i]["Date of Publication"])):
# copy the content of the date into old-value
old_value = data2.loc[i]["Date of Publication"]
# regex to match the first 4 digits of the old_value
new_value = re.findall("\d{4}", str(old_value))
# replace the old value
data2.loc[i, 'Date of Publication'] = new_value[0]
它给出了错误
IndexError Traceback (most recent call last)
<ipython-input-66-be514cf910bf> in <module>()
15
16 # replace the old value
---> 17 data2.loc[i, 'Date of Publication'] = new_value[0]
18
IndexError: list index out of range
【问题讨论】:
-
既然这不是我们可以运行的程序,这只是一个猜测,但
new_value里面有什么吗?如果re.findall("\d{4}", str(old_value))没有找到任何东西怎么办? -
你可以试试
data2.loc[i:, 'Date of Publication'] = new_value[0]。我只是添加了一个冒号。
标签: python regex pandas dataframe data-cleaning