【问题标题】:Python regex to get n characters before and after a keyword in a line of textPython正则表达式在一行文本中的关键字之前和之后获取n个字符
【发布时间】:2016-01-25 01:12:37
【问题描述】:

我正在尝试解析文件并在字符串列表中搜索关键字。我需要在每次出现之前和之后返回“n”个字符。我让它在没有正则表达式的情况下工作,但它不是很有效。知道如何对正则表达式和 findall 做同样的事情吗? Lookup 是一个字符串列表。这就是我没有正则表达式的情况:

with open(file, 'r') as temp:
    for num, line in enumerate(temp, 1):
        for string in lookup:
            if string in line:

                # Split the line in 2 substrings
                tmp1 = line.split(string)[0]
                tmp2 = line.split(string)[1]

                # Truncate only 'n' characters before and after the keyword
                tmp = tmp1[-n:] + string + tmp2[:n]

                # Do something here...

这是正则表达式的开始:

with open(file, 'r') as temp:
    for num, line in enumerate(temp, 1):
        for string in lookup:
            # Regex search with Ignorecase
            searchObj = re.findall(string, line, re.M | re.I)

            if searchObj:
                print "search --> : ", searchObj

                # Loop trough searchObj and get n characters 

【问题讨论】:

    标签: python regex string file findall


    【解决方案1】:

    我让它工作了。如果有人需要,下面是代码:

    with open(file, 'r') as temp:
        for num, line in enumerate(temp, 1):
            for string in lookup:
    
                # Regex
                searchObj = re.finditer(string, line, re.M | re.I)
    
                if searchObj:
                    for match in searchObj:
    
                        # Find the start index of the keyword
                        start = match.span()[0]
    
                        # Find the end index of the keyword
                        end = match.span()[1]
    
                        # Truncate line to get only 'n' characters before and after the keyword
                        tmp = line[start-n:end+n] + '\n'            
                        print tmp
    

    【讨论】:

      【解决方案2】:

      来自https://docs.python.org/2/library/re.html

      start([group])
      end([group])
         Return the indices of the start and end of the substring matched by 
         group; group defaults to zero (meaning the whole matched substring). 
         Return -1 if group exists but did not contribute to the match. For a 
         match object m, and a group g that did contribute to the match, the 
         substring matched by group g (equivalent to m.group(g)) is
      
      
          m.string[m.start(g):m.end(g)]
      
          Note that m.start(group) will equal m.end(group) if group matched a 
          null string. For example, after m = re.search('b(c?)', 'cba'), 
          m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both    
          2, and m.start(2) raises an IndexError exception.
      

      使用re.finditer,您可以生成MatchObject 的迭代器,然后使用这些属性获取子字符串的开始和结束。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2022-11-20
        • 2022-08-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多