Python用正则表达式解析字符串构成字典答案

【问题标题】：Python parse string with regex for constitute a dictionaryPython用正则表达式解析字符串构成字典
【发布时间】：2015-06-09 12:13:38
【问题描述】：

我需要在Python中提取以下字符串来构成一个字典：

2014:02:02-12:24:17 NAMETEST ulogd[4834]: id="xxxx" severity="xxxx" sys="xxxx" sub="xxxx" name="xxxx aaaa" action="xxxx" fwrule="xxxx" outitf="xxxx" srcmac="xxxx" srcip="xxxx" dstip="xxxx" proto="x" 长度="xxxx" tos="xxxx" prec="xxxx" ttl="xx" srcport="xxxx" dstport="xxxx" tcpflags="xxxx"

我不使用带空格的split(' ')，因为例如，字段name="xxxx aaaa" 可以包含空格。

首先使用以下正则表达式，我只提取了数据：

re.findall('"([^"]*)"', line)

但现在我需要使用字典格式，例如：line['id'] = 1111。

那么正则表达式呢？你有什么想法吗？

【问题讨论】：

标签： python regex parsing

【解决方案1】：

您可以使用re.findall() 查找键值对：

>>> import re
>>> groups = re.findall(r'(\w+)="(.*?)"', s)
>>> line = dict(groups)
>>>
>>> from pprint import pprint
>>> pprint(line)
{'action': 'xxxx',
 'dstip': 'xxxx',
 'dstport': 'xxxx',
 'fwrule': 'xxxx',
 'id': 'xxxx',
 'length': 'xxxx',
 'name': 'xxxx aaaa',
 'outitf': 'xxxx',
 'prec': 'xxxx',
 'proto': 'x',
 'severity': 'xxxx',
 'srcip': 'xxxx',
 'srcmac': 'xxxx',
 'srcport': 'xxxx',
 'sub': 'xxxx',
 'sys': 'xxxx',
 'tcpflags': 'xxxx',
 'tos': 'xxxx',
 'ttl': 'xx'}

(\w+)="(.*?)" 将匹配一个或多个字母数字字符（\w+ 部分），然后是 ="，然后是任何字符（.*?，非贪婪），然后是 "。这里的括号定义capturing groups。

【讨论】：