Python：在单行中解析具有多个嵌套字典的 JSON 字符串答案

【问题标题】：Python: Parse JSON string with multiple nested dicts in single linePython：在单行中解析具有多个嵌套字典的 JSON 字符串
【发布时间】：2021-01-11 00:16:57
【问题描述】：

我有很多 JSON 文件要解析，每个文件大小在 1-2 Mb 之间。通常，使用 json.load(json_file) 将 JSON 中的数据作为字典加载是没有问题的。但是，在这种情况下，JSON 是多个嵌套字典的字符串，都在一行中。

字典不像列表中那样用“，”分隔。我每个文件只有一串很长的嵌套字典。例如，在下面的 sn-p 中，我有两个嵌套字典，每个字典的外层都有一个键（第一和第二字典分别为“GGGGHH”和“GGGHGH”）。

{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}

我见过解析多个 JSON 对象的示例，但仅限于它们位于数组中时。

有人可以帮忙吗？我无法控制 JSON 文件的格式，因此无法以更简单的格式重新生成数据。抱歉，如果这个问题之前已经回答过 - 我看不到任何适用于这种特殊情况的答案。

【问题讨论】：

JSON 是拆分成多行还是单行呈现完全无关紧要，只要格式正确即可。剩下的只是美化。
能否请您添加所需输出的示例以及您的代码（如果您编写了任何代码）？
对我来说看起来像是无效的 JSON。我通过https://jsonlint.com/ 运行它

标签： python json string dictionary

【解决方案1】：

你的字符串是无效的 json，但它看起来只是一堆有效的 json 字典，没有逗号。

只需在字典之间添加逗号，将任何出现的 "}{" 替换为 "}, {"，将其粘贴在 "[" 和 "]" 之间，使其成为字典列表的有效 json，您可以json.loads!

s = '{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}'
json.loads("[" + s.replace("}{", "}, {") + "]")

输出：

[{'GGGGHH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['229.0934']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['293.1353']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'a4': {'spectrum_89': ['202.1087']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}},
 {'GGGHGH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['309.1312'], 'spectrum_107': ['309.1314']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['213.0985'], 'spectrum_107': ['213.0985']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}}]

对于更一般的情况（例如，如果两个字典之间可以存在空格，则使用正则表达式替换。

json.loads("[" + re.sub(r"\}\s*\{", "}, {", s) + "]")

其中正则表达式 "\}\s*\{" 匹配 }，后跟 0 个或多个空格字符，然后是 {。

【讨论】：

作为替代方案，可以将}{ 替换为}\n{，然后只使用ndjson。我将添加一个示例
@buran 有趣，TIL 这样的事情是存在的。使用 ndjson 而不是将其解析为逗号分隔列表有什么好处吗？
我会说这是个人喜好问题。这就是为什么我说“作为替代”。
抱歉回复晚了。效果很好，谢谢！

【解决方案2】：

这看起来很像格式错误的ndjson。您可以将}{ 替换为}\n{，然后使用ndjson

import ndjson
with open('spam.json') as f:
    source = f.read()
    source = source.replace('}{', '}\n{')
    data = ndjson.loads(source)

print(data)

【讨论】：