【问题标题】:Search for combinations in JSON nested object在 JSON 嵌套对象中搜索组合
【发布时间】:2021-08-15 08:38:13
【问题描述】:

我有一个大的 JSON 对象。一部分是:

data = [
{  
   'make': 'dacia',
   'model': 'x',
   'version': 'A',
   'typ': 'sedan',
   'infos': [
            {'id': 1, 'name': 'steering wheel problems'}, 
            {'id': 32, 'name': ABS errors}
   ]
},
{  
   'make': 'nissan',
   'model': 'z',
   'version': 'B',
   'typ': 'coupe',
   'infos': [
         {'id': 3,'name': throttle problems'}, 
         {'id': 56, 'name': 'broken handbreak'}, 
         {'id': 11, ;'name': missing seatbelts'}
   ]
}
]

我创建了一个列表,列出了我的 JSON 中可能出现的所有可能的信息组合(一辆车有时可能只有一个信息,而另一辆车可能有很多):

inf = list(set(i.get'name' for d in data for i in (d['infos'] if isinstance(d['infos'], list) else [d['infos']]))
inf_comb = [combo for n in range(1, len(infos+1)) for combo in itertools.combinations(infos, n)]
infos_combo = [list(elem) for elem in inf_comb]

现在我需要遍历整个 JSON data 并计算某些 infos_combo 集合出现的次数,因此我创建了代码:

tab = []
s = 0
for x in infos_combo:
   s = sum([1 for k in data if (([i['name'] for i in (k['infos'] if isinstance(k['infos'], list) else [k['infos']])] == x))])
   if s!= 0:
     tab.append({'infos': r, 'sum': s})
print(tab)

我面临的问题是tab 仅返回我期望的一些元素 - 我的 JSON 对象中出现了更多组合并且必须计算但我无法得到它们。怎么解决?

【问题讨论】:

标签: python json combinations itertools


【解决方案1】:

好的,首先您需要从 json 数据中获取所有实际的“信息”,如下所示:

infos = [
    [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
    for d in data
]

这将为您提供如下所示的内容,我们稍后将使用:

[['steering wheel problems', 'ABS errors'], ['throttle problems', 'broken handbreak', 'missing seatbelts']]

现在,要获得所有组合,我们首先需要通过展平 infos 数组并清除重复项来处理它:

unique_infos = [x for l in infos for x in l]

获取所有组合:

infos_combo = itertools.chain.from_iterable(
    itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
)

这将产生:

()
('steering wheel problems',)
('ABS errors',)
('throttle problems',)
('broken handbreak',)
('missing seatbelts',)
('steering wheel problems', 'ABS errors')
('steering wheel problems', 'throttle problems')
('steering wheel problems', 'broken handbreak')
...
# truncated code too long
...
('steering wheel problems', 'throttle problems', 'broken handbreak', 'missing seatbelts')
('ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')
('steering wheel problems', 'ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')

之后,我们需要对原始信息列表中的每个组合进行计数:

occurences = {}
for combo in infos_combo:
    occurences[combo] = infos.count(list(combo))

print(occurences)

完整代码:

import itertools
import sys

data = [
    {
        "make": "dacia",
        "model": "x",
        "version": "A",
        "typ": "sedan",
        "infos": [
            {"id": 1, "name": "steering wheel problems"},
            {"id": 32, "name": "ABS errors"},
        ],
    },
    {
        "make": "nissan",
        "model": "z",
        "version": "B",
        "typ": "coupe",
        "infos": [
            {"id": 3, "name": "throttle problems"},
            {"id": 56, "name": "broken handbreak"},
            {"id": 11, "name": "missing seatbelts"},
        ],
    },
]

infos = [
    [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
    for d in data
]

unique_infos = [x for l in infos for x in l]

infos_combo = itertools.chain.from_iterable(
    itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
)

occurences = {}
for combo in infos_combo:
    occurences[combo] = infos.count(list(combo))

print(occurences)

【讨论】:

    猜你喜欢
    • 2020-06-15
    • 1970-01-01
    • 2018-03-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-06-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多