使用python获取围绕正则表达式匹配的文本答案

【问题标题】：Get text surrounding regex match with python使用python获取围绕正则表达式匹配的文本
【发布时间】：2015-12-18 10:52:33
【问题描述】：

我有一个像这样的 todo.txt 列表，用换行符分隔：

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer
(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

我有正则表达式来捕获 +Projects 和 @Contexts：

## Projects
project_matches = re.findall('[+]\D\w+',todo_list)
print list(set(project_matches))

## Contexts
context_matches = re.findall('[@][A-Z]\w+',todo_list)
print list(set(context_matches))

但我也想通过 +Project 或 @Context 快速有效地捕获每个任务和组。

例如，这是所需的输出：

Phone:

(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer

Computer:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer

Tasker:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

等等……

我也有 Regex 在找到项目或上下文时捕获任务，但我不知道它是否有帮助：(.*)(?=[+]\D\w+)(.*)

【问题讨论】：

标签： python regex

【解决方案1】：

你可以建立一些字典。 defaultdict 可以更轻松地以 list 开始每个项目。

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
with open('todo_list.txt') as todo_list:
    for line in todo_list:
        for item in re.findall(r'[+]\D\w+', line):
            projects[item].append(line)
        for item in re.findall(r'[@][A-Z]\w+', line):
            contexts[item].append(line)

如果您已经将整个文件读入单个字符串，请使用splitlines() 遍历每一行：

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
for line in todo_list.splitlines():
    for item in re.findall(r'[+]\D\w+', line):
        projects[item].append(line)
    for item in re.findall(r'[@][A-Z]\w+', line):
        contexts[item].append(line)

【讨论】：

【解决方案2】：

您可以使用^.*word.*$ 抓取出现给定单词的整行

含义：从字符串的开头^匹配任意字符.任意次数*然后匹配一个单词。再次匹配任意字符多次.*，直到行尾$

为了完成你的任务，你可以做类似的事情

tasks = re.findall(r"(^.*?%s.*?$)" % context, todo_list, re.MULTILINE)

context 是您要查找的词（电话、计算机、Tasker 等）

编辑：re.MULTILINE 在每一行都匹配 re。它的作用类似于g 修饰符。你可以在这里看到我的例子：https://regex101.com/r/gS2yN9/1

【讨论】：