使用正则表达式、正则表达式查找文本并返回列表答案

【问题标题】：Find text and return list with Regex, Regular Expression使用正则表达式、正则表达式查找文本并返回列表
【发布时间】：2020-12-09 09:48:40
【问题描述】：

我正在尝试使用 python 正则表达式 re 从文本文件 (.txt) 中创建一个列表。部分文字如下所示。

146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622\n197.109.77.178 - - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

我可以知道如何将列表格式的文本正则表达式为：

{
"host_name": "146.204.224.152", 
"name": "feest6811", 
"time": "21/Jun/2019:15:45:24 -0700", 
"method": "POST /incentivize HTTP/1.1"
},
..
..
..

我正在尝试使用这种模式进行正则表达式，因为我看到了这种模式的示例：

pattern="(?P<host_name>.*)(\ -\ )(?P<name>\w*)"

for item in re.finditer(pattern,'Text_data',re.VERBOSE):
    print(item.groupdict())

对本文正则表达式的任何建议。

【问题讨论】：

最好为此创建一个解析器，然后使用正则表达式，因为这看起来像一个具有适当结构的网络日志
当你说“列表格式”时，你能举个例子吗？您只想包含字典示例的键或值，还是两者都包含？
@gmdev 抱歉使用错误。我提到的是我希望字典从字符串中返回。

标签： python regex python-re

【解决方案1】：

使用

(?m)^(?P<host_name>[\d.]+) - (?P<name>\w+) \[(?P<time>[^][]+)] "(?P<method>[^"]+)"

见proof。

说明

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<host_name>           group and capture to \k<host_name>:
--------------------------------------------------------------------------------
    [\d.]+                   any character of: digits (0-9), '.' (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<host_name>
--------------------------------------------------------------------------------
   -                       ' - '
--------------------------------------------------------------------------------
  (?P<name>                 group and capture to \k<name>:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<name>
--------------------------------------------------------------------------------
                           ' '
--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  (?P<time>                group and capture to \k<time>:
--------------------------------------------------------------------------------
    [^][]+                   any character except: ']', '[' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<time>
--------------------------------------------------------------------------------
  ] "                      '] "'
--------------------------------------------------------------------------------
  (?P<method>                        group and capture to \k<method>:
--------------------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<method>
--------------------------------------------------------------------------------
  "                        '"'

【讨论】：