【问题标题】:Generate random JSON structure permutations for a data set为数据集生成随机 JSON 结构排列
【发布时间】:2017-10-03 14:41:25
【问题描述】:

我想生成许多不同的 JSON 结构排列作为相同数据集的表示,最好不必对实现进行硬编码。例如,给定以下 JSON:

{"name": "smith", "occupation": "agent", "enemy": "humanity", "nemesis": "neo"}`

应该产生许多不同的排列,例如:

  • 改名:{"name":"smith"}- > {"last_name":"smith"}
  • 更改顺序:{"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."}
  • 安排变更:{"name":"...","occupation":"..."} -> "smith":{"occupation":"..."}
  • 更改模板:{"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."}

目前实现如下:

我正在使用itertools.permutations 和 OrderedDict() 来确定可能的键和相应的值组合以及它们的返回顺序。

key_permutations = SchemaLike(...).permutate()

all_simulacrums = []
for key_permutation in key_permutations:
   simulacrums = OrderedDict(key_permutation)
   all_simulacrums.append(simulacrums)
for x in itertools.permutations(all_simulacrums.items()):
    test_data = json.dumps(OrderedDict(p))
    print(test_data)
    assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

当我尝试实现排列和模板的排列时,会出现我的问题。 我不知道如何最好地实现这个功能,有什么建议吗?

【问题讨论】:

  • Python dicts 是无序集合(JSON 对象也是无序的,但我想这就是你想要测试的)。使用collections.OrderdDict 而不是普通的“dicts”。
  • dict's 在 python 中是无序的,json 对象的实现类似于dicts
  • 不,我希望能够在 JSON 中动态生成许多不同的 JSON 结构排列,作为同一数据集的表示,最好不必对实现进行硬编码。
  • 感谢您的回答
  • 对于名称更改,如何指定有效选项?

标签: python json permutation


【解决方案1】:

对于排序,只需使用有序的字典:

>>> data = OrderedDict(foo='bar', bacon='eggs', bar='foo', eggs='bacon')
>>> for p in itertools.permutations(data.items()):
...     test_data = json.dumps(OrderedDict(p))
...     print(test_data)
...     assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "bacon", "bar": "foo"}
{"foo": "bar", "bar": "foo", "bacon": "eggs", "eggs": "bacon"}
{"foo": "bar", "bar": "foo", "eggs": "bacon", "bacon": "eggs"}
{"foo": "bar", "eggs": "bacon", "bacon": "eggs", "bar": "foo"}
{"foo": "bar", "eggs": "bacon", "bar": "foo", "bacon": "eggs"}
{"bacon": "eggs", "foo": "bar", "bar": "foo", "eggs": "bacon"}
{"bacon": "eggs", "foo": "bar", "eggs": "bacon", "bar": "foo"}
{"bacon": "eggs", "bar": "foo", "foo": "bar", "eggs": "bacon"}
{"bacon": "eggs", "bar": "foo", "eggs": "bacon", "foo": "bar"}
{"bacon": "eggs", "eggs": "bacon", "foo": "bar", "bar": "foo"}
{"bacon": "eggs", "eggs": "bacon", "bar": "foo", "foo": "bar"}
{"bar": "foo", "foo": "bar", "bacon": "eggs", "eggs": "bacon"}
{"bar": "foo", "foo": "bar", "eggs": "bacon", "bacon": "eggs"}
{"bar": "foo", "bacon": "eggs", "foo": "bar", "eggs": "bacon"}
{"bar": "foo", "bacon": "eggs", "eggs": "bacon", "foo": "bar"}
{"bar": "foo", "eggs": "bacon", "foo": "bar", "bacon": "eggs"}
{"bar": "foo", "eggs": "bacon", "bacon": "eggs", "foo": "bar"}
{"eggs": "bacon", "foo": "bar", "bacon": "eggs", "bar": "foo"}
{"eggs": "bacon", "foo": "bar", "bar": "foo", "bacon": "eggs"}
{"eggs": "bacon", "bacon": "eggs", "foo": "bar", "bar": "foo"}
{"eggs": "bacon", "bacon": "eggs", "bar": "foo", "foo": "bar"}
{"eggs": "bacon", "bar": "foo", "foo": "bar", "bacon": "eggs"}
{"eggs": "bacon", "bar": "foo", "bacon": "eggs", "foo": "bar"}

同样的原则也适用于键/值排列:

>>> for p in itertools.permutations(data.keys()):
...:     test_data = json.dumps(OrderedDict(zip(p, data.values())))
...:     print(test_data)
...:     
{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "foo", "bar": "bacon"}
{"foo": "bar", "bar": "eggs", "bacon": "foo", "eggs": "bacon"}
{"foo": "bar", "bar": "eggs", "eggs": "foo", "bacon": "bacon"}
{"foo": "bar", "eggs": "eggs", "bacon": "foo", "bar": "bacon"}
{"foo": "bar", "eggs": "eggs", "bar": "foo", "bacon": "bacon"}
{"bacon": "bar", "foo": "eggs", "bar": "foo", "eggs": "bacon"}
{"bacon": "bar", "foo": "eggs", "eggs": "foo", "bar": "bacon"}
{"bacon": "bar", "bar": "eggs", "foo": "foo", "eggs": "bacon"}
{"bacon": "bar", "bar": "eggs", "eggs": "foo", "foo": "bacon"}
{"bacon": "bar", "eggs": "eggs", "foo": "foo", "bar": "bacon"}
{"bacon": "bar", "eggs": "eggs", "bar": "foo", "foo": "bacon"}
{"bar": "bar", "foo": "eggs", "bacon": "foo", "eggs": "bacon"}
{"bar": "bar", "foo": "eggs", "eggs": "foo", "bacon": "bacon"}
{"bar": "bar", "bacon": "eggs", "foo": "foo", "eggs": "bacon"}
{"bar": "bar", "bacon": "eggs", "eggs": "foo", "foo": "bacon"}
{"bar": "bar", "eggs": "eggs", "foo": "foo", "bacon": "bacon"}
{"bar": "bar", "eggs": "eggs", "bacon": "foo", "foo": "bacon"}
{"eggs": "bar", "foo": "eggs", "bacon": "foo", "bar": "bacon"}
{"eggs": "bar", "foo": "eggs", "bar": "foo", "bacon": "bacon"}
{"eggs": "bar", "bacon": "eggs", "foo": "foo", "bar": "bacon"}
{"eggs": "bar", "bacon": "eggs", "bar": "foo", "foo": "bacon"}
{"eggs": "bar", "bar": "eggs", "foo": "foo", "bacon": "bacon"}
{"eggs": "bar", "bar": "eggs", "bacon": "foo", "foo": "bacon"}

等等...如果您不需要所有组合,则可以只使用一组预定义的键/值。您还可以使用for 循环和random.choice 来掷硬币以跳过某些组合,或者使用random.shuffle 冒重复组合的风险。

对于模板的事情,我猜你必须创建一个不同模板的列表(或者如果你想要嵌套结构的列表列表),然后迭代它以创建你的数据。为了提供更好的建议,我们需要对您想要的内容进行更严格的规范。

请注意,有几个库可以在 Python 中生成测试数据:

>>> from faker import Faker
>>> faker = Faker()
>>> faker.credit_card_full().strip().split('\n')
['VISA 13 digit', 'Jerry Gutierrez', '4885274641760 04/24', 'CVC: 583']

Faker 有多个模式,很容易创建您自己的自定义假数据提供者。

【讨论】:

  • 非常感谢您的回答,那么您将如何利用 itertools.permutations 分别根据字典(或通过其他方法)更改上面指定的字段名称、模板和排列?跨度>
【解决方案2】:

由于已经回答了 dict 命令的 shuffle,我将跳过它。

当我想到新事物时,我会添加到这个答案中。

from random import randint
from collections import OrderedDict

#Randomly shuffles the key-value pairs of a dictionary
def random_dict_items(input_dict):
    items = input_dict.items()
    new_dict = OrderedDict()
    for i in items:
        rand = randint(0, 1)
        if rand == 0:
            new_dict[i[0]] = i[1]
        else:
            new_dict[i[1]] = i[0]
    return new_dict

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-02-26
    • 1970-01-01
    • 1970-01-01
    • 2013-06-16
    • 1970-01-01
    • 2015-02-26
    • 2015-09-17
    • 2016-07-15
    相关资源
    最近更新 更多