【问题标题】:split a line into a dictionary with multiple layers of key value pairs将一行拆分为具有多层键值对的字典
【发布时间】:2016-01-04 18:03:30
【问题描述】:

我有一个文件,其中包含这种格式的行。

Example 1:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"

Example 2:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

我首先用 ':' 分割行,这给了我一个包含 2 个条目的列表。 我想将此行拆分为带有键和值的字典,但其中的分数键有多个带有值的子键。

Hole 1
Par 4
Index 2
Distance 459
Score 
    Player1 4
    Player2 6
    Player3 4

所以我正在使用这样的东西......

split_line_by_semicolon = nextline.split(":")
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=')     
    for item in split_line_by_semicolon.split(';')))
        for keys,values in dictionary_of_line.items():
            print("{0} {1}".format(keys,values))

但是我在该行的 score 元素上遇到错误:

ValueError: too many values to unpack (expected 2)

我可以将 '=' 上的拆分调整为此,所以它会在第一个 '=' 之后停止

dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=',1)     
    for item in split_line_by_semicolon.split(';')))
        for keys,values in dictionary_of_line.items():
            print("{0} {1}".format(keys,values))

但是我丢失了大括号内的子值。有人知道我如何实现这个多层字典吗?

【问题讨论】:

  • split_line_by_semicolon.split(';') 在我看来不合适。 split_line_by_semicolon 是一个列表,而列表没有 split 方法。您确定这正是您正在运行的代码吗?
  • 正确。我解析了一行中的其他一些内容以深入了解它。所以它实际上是 split_line_by_semicolon[3]

标签: python dictionary split key-value


【解决方案1】:

一种更简单的方法(但我不知道在您的情况下是否可以接受)是:

import re

nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

# compiles the regular expression to get the info you want
my_regex = re.compile(r'\w+ \= \w+')

# builds the structure of the dict you expect to get 
final_dict = {'Hole':0, 'Par':0, 'Index':0, 'Distance':0, 'Score':{}}

# uses the compiled regular expression to filter out the info you want from the string
filtered_items = my_regex.findall(nextline)

for item in filtered_items:
    # for each filtered item (string in the form key = value)
    # splits out the 'key' and handles it to fill your final dictionary
    key = item.split(' = ')[0]
    if key.startswith('Player'):
        final_dict['Score'][key] = int(item.split(' = ')[1])
    else:
        final_dict[key] = int(item.split(' = ')[1])

【讨论】:

  • 谢谢。我喜欢这个。非常适合。
【解决方案2】:
lines = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };", "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

def lines_to_dict(nextline):
    import json
    # cut up to Hole
    nextline = nextline[nextline.index("Hole"):]
    # convert to dict format
    string_ = re.sub(r'\s+=',':',nextline)
    string_ = re.sub(r';',',',string_)
    # json likes double quotes
    string_ = re.sub(r'(\b\w+)',r'"\1"',string_)
    string_ = re.sub(r',$',r'',string_)
    # make dict for Hole
    mo = re.search(r'(\"Hole.+?),\W+Score.*',string_)
    if mo:
        d_hole = json.loads("{" + mo.groups()[0] + "}")
    # make dict for Score
    mo = re.search(r'(\"Score.*)',string_)
    if mo:
        d_score = json.loads("{" + mo.groups()[0] + "}")
    # combine dicts
    d_hole.update(d_score)
    return d_hole

for d in lines:
pprint.pprint(lines_to_dict(d))

{'Distance': '459',
 'Hole': '1',
 'Index': '2',
 'Par': '4',
 'Score': {'Player1': '4'}}

{'Distance': '459',
 'Hole': '1',
 'Index': '2',
 'Par': '4',
 'Score': {'Player1': '4', 'Player2': '6', 'Player3': '4'}}

【讨论】:

    【解决方案3】:

    我会以与 maccinza 相同的方式使用正则表达式(我喜欢他的回答),但有一个细微差别 - 可以递归处理包含内部字典的数据:

    #example strings:
    nextline1 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
    nextline2 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"
    
    import re
    lineRegexp = re.compile(r'.+\'WeeklyMedal:(.+)\'?') #this regexp returns WeeklyMedal record.
    weeklyMedalRegexp = re.compile(r'(\w+) = (\{.+\}|\w+)') #this regexp parses WeeklyMedal
    
    #helper recursive function to process WeeklyMedal record. returns dictionary
    parseWeeklyMedal = lambda r, info: { k: (int(v) if v.isdigit() else parseWeeklyMedal(r, v)) for (k, v) in r.findall(info)}
    parsedLines = []
    for line in [nextline1, nextline2]:
        info = lineRegexp.search(line)
        if info:
            #process WeeklyMedal record
            parsedLines.append(parseWeeklyMedal(weeklyMedalRegexp, info.group(0)))
            #or do something with parsed dictionary in place
    
    # do something here with entire result, print for example
    print(parsedLines)
    

    【讨论】:

      猜你喜欢
      • 2021-04-15
      • 1970-01-01
      • 1970-01-01
      • 2022-01-15
      • 2021-12-27
      • 1970-01-01
      • 2017-04-25
      • 2019-04-14
      相关资源
      最近更新 更多