如何替换不遵循特定模式模式的行？ [关闭]答案

【问题标题】：How to replace rows which do not follow a specific schema-pattern? [closed]如何替换不遵循特定模式模式的行？ [关闭]
【发布时间】：2021-03-04 10:35:49
【问题描述】：

我想删除所有不遵循此模式的行

01-12-2002 12:00:00

我的列是 type('O')，我想将其转换为日期时间，但不幸的是有些行包含文本。我的想法是排除所有不遵循该模式的行（使用正则表达式我会说\w+-\w+-\w+\s\w+-\w+-\w+）而不是数字。

但是，似乎上面的模式在应用于列时确实有效。

如果您能告诉我如何修复上述模式以排除（或仅用空值替换）不包含该模式的行，我将不胜感激。

【问题讨论】：

标签： python regex pandas

【解决方案1】：

试试.str.match:

# sample data
df = pd.DataFrame({'your_column':['01-12-2002 12:00:00', 'This 01-12-2002 12:00:00', 
                                  'Another row', '01-12-2002 12:00:01']})

# different pattern than yours, notice the two `:`
df.loc[df['your_column'].str.match('^\w+-\w+-\w+\s\w+:\w+:\w+$')]

输出：

           your_column
0  01-12-2002 12:00:00
3  01-12-2002 12:00:01

【讨论】：

【解决方案2】：

您的错误是您在应该使用: 的空格之后使用了-。此外，您应该使用\d 而不是\w，因为\w 允许使用字母。

import re

teststr = """
01-12-2002 12:00:00
02-27-2012 11:12:34
this is text
08-03-2004 01:13:37
""".strip()

# re.M is multiline flag that lets ^ match start of line and $ match end of line
pattern = re.compile(r"^\d+-\d+-\d+\s\d+:\d+:\d+$", re.M)

# find all the lines that match and join on newline
filtered = "\n".join(pattern.findall(teststr))
print(filtered)
"""
prints:
01-12-2002 12:00:00
02-27-2012 11:12:34
08-03-2004 01:13:37
"""

【讨论】：