【问题标题】:Regex to return single line when pattern is matched for a specific text file当特定文本文件的模式匹配时,正则表达式返回单行
【发布时间】:2020-08-21 17:57:27
【问题描述】:

我有多个文本文件,想在特定模式匹配时提取字符串,并将其附加到带有文件名和字符串的数据框中。在我的例子中,这些文本文件中存在多个相同的模式。

sample.txt:
"government high school
Govt high school physics department
Employee Designation School Assistant"

What I am getting:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department
sample.txt           government high school                   Employee Designation School Assistant

What I am looking for:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department

这是我正在使用的代码:

prs_path = "C://Users//subhr//scope_txt//"

df3 = [] 
for file in os.listdir(prs_path):
    Name = None
    with open(prs_path + file) as fd:
        for line in fd:
            line = line.lower()
            match = re.search('r(^.*government.*$)', line, re.I)
            Org = ""
            if match:
                Org = match.group()
                df3.append([file, Org])
            Org2 = ""
            Org3 = ""
            Org = ""
            if match is None:
                match2 = re.search('r(^.*school.*$)|(^.*college.*$)', line,re.I)
                if match2:
                    Org2 = match2.group()
                    df3.append([file, Org, Org2])
                if match2 is None:
                    match3 = re.search('r(^.*power.*$)', line, re.I)
                    if match3:
                        Org3 = match3.group()
                        df3.append([file, Org, Org2, Org3])
                    if match3 is None:
                        continue

我哪里错了?

【问题讨论】:

    标签: python-3.x regex regex-group python-re


    【解决方案1】:

    尝试使用这种情况r"^(.*?):$\n\"(.*?) (.*?)$\n(.*?) (.*? .*?) (.*?)$"

    您的输入将分为 6 组,请检查此以进行测试。

    https://regex101.com/r/UN9cjZ/1

    【讨论】:

    • 感谢您的回复,但这只是一个例子。我的实际文本文件中有不同长度的不同行:(
    猜你喜欢
    • 1970-01-01
    • 2012-02-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-15
    相关资源
    最近更新 更多