【发布时间】:2019-10-10 15:13:27
【问题描述】:
我想在 python 3 中使用正则表达式获取文本中的日期和特定项目。下面是一个示例:
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
在上面的示例中,我想获取“成功行”之后的所有行。这里需要的输出:
[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]
这是我试过的:
>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail line1 this is the 1st fail line2 fail line2 this is the 2nd fail line3 success line3 this is the 1st success process line3 this process need 3sec 200219 9:10:10 line1 fail line1 this is the 1st fail line2 success line2 this is the 1st success process line2 this process need 4sec line3 success line3 this is the 2st success process line3 this process need 2sec '
我不知道获得结果的正确方法是什么。我试过这个来得到这条线:
(\b\d{6}\b \d{1,}:\d{2}:\d{2})...
我该如何解决这个问题?
【问题讨论】:
-
一切都必须在严格的正则表达式中完成吗?部分解决方案可以不是正则表达式吗?
-
不,不是。我只是提到了我尝试过的东西。我不知道其他方式@kosayoda
标签: python regex python-3.x string findall