在一行上格式化某些 JSON 对象答案

【问题标题】：Format certain JSON objects on one line在一行上格式化某些 JSON 对象
【发布时间】：2016-10-13 18:50:10
【问题描述】：

考虑以下代码：

>>> import json
>>> data = {
...     'x': [1, {'$special': 'a'}, 2],
...     'y': {'$special': 'b'},
...     'z': {'p': True, 'q': False}
... }
>>> print(json.dumps(data, indent=2))
{
  "y": {
    "$special": "b"
  },
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {
      "$special": "a"
    },
    2
  ]
}

我想要的是格式化 JSON，以便将只有一个属性 '$special' 的 JSON 对象呈现在一行上，如下所示。

{
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {"$special": "a"},
    2
  ]
}

我已经尝试过实现自定义 JSONEncoder 并将其作为 cls 参数传递给 json.dumps，但是 JSONEncoder 上的两种方法都有问题：

JSONEncoderdefault 方法为data 的每个部分调用，但返回值不是原始 JSON 字符串，因此似乎没有任何方法可以调整其格式。
JSONEncoder encode 方法确实返回一个原始 JSON 字符串，但它只对整个 data 调用一次。

有什么方法可以让JSONEncoder 做我想做的事吗？

【问题讨论】：

为什么首先需要这个？ json 模块并没有真正设置为让您在该程度上控制输出格式。
另外，当"$special" 存在时，它是否保证是唯一的键？
@MartijnPieters 我想在面向开发人员的 UI 中显示 JSON 数据。 {'$special': 'some key'} 形式的 JSON 对象在这个 JSON 数据中大量出现，所以我只是在探索视觉上压缩它的可能性。可以假设 '$special' 是唯一存在的键，尽管我认为这与我真正要问的问题是正交的：如何在本地修改 JSON 格式。答案可能很简单：“你不能使用 json 模块。”
我自己尝试做一些与此非常相似的事情，但在JSONEncoder 中没有骰子。我最终只是放弃了战斗并使用标准的美化。
我真的希望能找到类似yapf 的东西，但用于格式化 json，最好是作为 Python 库。不过我还没有找到。

标签： python json python-3.x formatting

【解决方案1】：

json 模块并不是真正设计用于让您对输出进行如此多的控制；缩进主要是为了提高调试时的可读性。

您可以使用标准库tokenize module转换输出，而不是让json 产生输出：

import tokenize
from io import BytesIO


def inline_special(json_data):
    def adjust(t, ld,):
        """Adjust token line number by offset"""
        (sl, sc), (el, ec) = t.start, t.end
        return t._replace(start=(sl + ld, sc), end=(el + ld, ec))

    def transform():
        with BytesIO(json_data.encode('utf8')) as b:
            held = []  # to defer newline tokens
            lastend = None  # to track the end pos of the prev token
            loffset = 0     # line offset to adjust tokens by
            tokens = tokenize.tokenize(b.readline)
            for tok in tokens:
                if tok.type == tokenize.NL:
                    # hold newlines until we know there's no special key coming
                    held.append(adjust(tok, loffset))
                elif (tok.type == tokenize.STRING and
                        tok.string == '"$special"'):
                    # special string, collate tokens until the next rbrace
                    # held newlines are discarded, adjust the line offset
                    loffset -= len(held)
                    held = []
                    text = [tok.string]
                    while tok.exact_type != tokenize.RBRACE:
                        tok = next(tokens)
                        if tok.type != tokenize.NL:
                            text.append(tok.string)
                            if tok.string in ':,':
                                text.append(' ')
                        else:
                            loffset -= 1  # following lines all shift
                    line, col = lastend
                    text = ''.join(text)
                    endcol = col + len(text)
                    yield tokenize.TokenInfo(
                        tokenize.STRING, text, (line, col), (line, endcol),
                        '')
                    # adjust any remaining tokens on this line
                    while tok.type != tokenize.NL:
                        tok = next(tokens)
                        yield tok._replace(
                            start=(line, endcol),
                            end=(line, endcol + len(tok.string)))
                        endcol += len(tok.string)
                else:
                    # uninteresting token, yield any held newlines
                    if held:
                        yield from held
                        held = []
                    # adjust and remember last position
                    tok = adjust(tok, loffset)
                    lastend = tok.end
                    yield tok

    return tokenize.untokenize(transform()).decode('utf8')

这会成功重新格式化您的样本：

import json

data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}

>>> print(inline_special(json.dumps(data, indent=2)))
{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "p": true,
    "q": false
  }
}

【讨论】：

【解决方案2】：

我发现以下基于正则表达式的解决方案最简单，尽管……基于正则表达式。

import json
import re
data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}
text = json.dumps(data, indent=2)
pattern = re.compile(r"""
{
\s*
"\$special"
\s*
:
\s*
"
((?:[^"]|\\"))*  # Captures zero or more NotQuote or EscapedQuote
"
\s*
}
""", re.VERBOSE)
print(pattern.sub(r'{"$special": "\1"}', text))

输出如下。

{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  }
}

【讨论】：

【解决方案3】：

你可以这样做，但你基本上必须从json.encoder 中复制/修改很多代码，因为编码函数并不是真正设计为被部分覆盖的。

基本上，从json.encoder 复制整个_make_iterencode 并进行更改，以便打印您的特殊字典而无需换行缩进。然后对 json 包进行monkeypatch 以使用您修改后的版本，运行json 转储，然后撤消monkeypatch（如果需要）。

_make_iterencode 函数很长，所以我只发布了需要更改的部分。

import json
import json.encoder

def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
    ...
    def _iterencode_dict(dct, _current_indent_level):
        ...
        if _indent is not None:
            _current_indent_level += 1
            if '$special' in dct:
                newline_indent = ''
                item_separator = _item_separator
            else:
                newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
                item_separator = _item_separator + newline_indent
            yield newline_indent
        ...
        if newline_indent is not None:
            _current_indent_level -= 1
            if '$special' not in dct:
                yield '\n' + (' ' * (_indent * _current_indent_level))

def main():
    data = {
        'x': [1, {'$special': 'a'}, 2],
        'y': {'$special': 'b'},
        'z': {'p': True, 'q': False},
    }

    orig_make_iterencoder = json.encoder._make_iterencode
    json.encoder._make_iterencode = _make_iterencode
    print(json.dumps(data, indent=2))
    json.encoder._make_iterencode = orig_make_iterencoder

【讨论】：