【发布时间】:2020-08-21 17:57:27
【问题描述】:
我有多个文本文件,想在特定模式匹配时提取字符串,并将其附加到带有文件名和字符串的数据框中。在我的例子中,这些文本文件中存在多个相同的模式。
sample.txt:
"government high school
Govt high school physics department
Employee Designation School Assistant"
What I am getting:
file | Org | Org2
sample.txt government high school Govt high school physics department
sample.txt government high school Employee Designation School Assistant
What I am looking for:
file | Org | Org2
sample.txt government high school Govt high school physics department
这是我正在使用的代码:
prs_path = "C://Users//subhr//scope_txt//"
df3 = []
for file in os.listdir(prs_path):
Name = None
with open(prs_path + file) as fd:
for line in fd:
line = line.lower()
match = re.search('r(^.*government.*$)', line, re.I)
Org = ""
if match:
Org = match.group()
df3.append([file, Org])
Org2 = ""
Org3 = ""
Org = ""
if match is None:
match2 = re.search('r(^.*school.*$)|(^.*college.*$)', line,re.I)
if match2:
Org2 = match2.group()
df3.append([file, Org, Org2])
if match2 is None:
match3 = re.search('r(^.*power.*$)', line, re.I)
if match3:
Org3 = match3.group()
df3.append([file, Org, Org2, Org3])
if match3 is None:
continue
我哪里错了?
【问题讨论】:
标签: python-3.x regex regex-group python-re