【问题标题】:Find text and return list with Regex, Regular Expression使用正则表达式、正则表达式查找文本并返回列表
【发布时间】:2020-12-09 09:48:40
【问题描述】:

我正在尝试使用 python 正则表达式 re 从文本文件 (.txt) 中创建一个列表。部分文字如下所示。

146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622\n197.109.77.178 - - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

我可以知道如何将列表格式的文本正则表达式为:

{
"host_name": "146.204.224.152", 
"name": "feest6811", 
"time": "21/Jun/2019:15:45:24 -0700", 
"method": "POST /incentivize HTTP/1.1"
},
..
..
..

我正在尝试使用这种模式进行正则表达式,因为我看到了这种模式的示例:

pattern="(?P<host_name>.*)(\ -\ )(?P<name>\w*)"

for item in re.finditer(pattern,'Text_data',re.VERBOSE):
    print(item.groupdict())

对本文正则表达式的任何建议。

【问题讨论】:

  • 最好为此创建一个解析器,然后使用正则表达式,因为这看起来像一个具有适当结构的网络日志
  • 当你说“列表格式”时,你能举个例子吗?您只想包含字典示例的键或值,还是两者都包含?
  • @gmdev 抱歉使用错误。我提到的是我希望字典从字符串中返回。

标签: python regex python-re


【解决方案1】:

使用

(?m)^(?P<host_name>[\d.]+) - (?P<name>\w+) \[(?P<time>[^][]+)] "(?P<method>[^"]+)"

proof

说明

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<host_name>           group and capture to \k<host_name>:
--------------------------------------------------------------------------------
    [\d.]+                   any character of: digits (0-9), '.' (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<host_name>
--------------------------------------------------------------------------------
   -                       ' - '
--------------------------------------------------------------------------------
  (?P<name>                 group and capture to \k<name>:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<name>
--------------------------------------------------------------------------------
                           ' '
--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  (?P<time>                group and capture to \k<time>:
--------------------------------------------------------------------------------
    [^][]+                   any character except: ']', '[' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<time>
--------------------------------------------------------------------------------
  ] "                      '] "'
--------------------------------------------------------------------------------
  (?P<method>                        group and capture to \k<method>:
--------------------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \k<method>
--------------------------------------------------------------------------------
  "                        '"'

【讨论】:

    猜你喜欢
    • 2014-04-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多