【发布时间】:2014-08-14 23:17:06
【问题描述】:
我有
FILE = open("file.txt", "r") #long text file
TEXT = FILE.read()
#long identification code with dots (.) and slashes (-)
regex = "process \d\d\d\d\d\d\d\-\d\d\.\d\d\d\d\.\d+\.\d\d\.\d\d\d\d"
SRC = re.findall(regex, TEXT, flags=re.IGNORECASE|re.MULTILINE)
如何获取第一次出现的第一个字符 SRC[i] 和下一次出现的第一个字符 SRC[i+1] 等之间的文本?找不到任何直截了当的满意答案...
更多信息编辑:
pattern = 'process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}'
sample_input = "Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3234567-89.1234.12431242.12.1234 - more text process 3234567-89.1234.12431242.12.1234 (...)"
sample_output[0] = "Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern "
sample_output[1] = "Process 2234567-89.1234.12431242.12.1234 : chars and more text "
sample_output[2] = "Process 3234567-89.1234.12431242.12.1234 - more text "
sample_output[3] = "process 3234567-89.1234.12431242.12.1234 "
【问题讨论】:
-
请提供一些示例输入和预期输出。
-
您可以将您的正则表达式缩短为:
\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4} -
你到底在问什么?显示您的一些输入,我想拆分可能有用
-
添加样本并输出
标签: python regex python-2.7