【问题标题】:Eliminate nesting by creating new objects from json通过从 json 创建新对象来消除嵌套
【发布时间】:2018-12-24 09:40:21
【问题描述】:

我有一个标准的嵌套 json 文件,如下所示: 它们是多级嵌套的,我必须通过创建新对象来消除所有嵌套。

嵌套的 json 文件。

{
"persons": [{
    "id": "f4d322fa8f552",
    "address": {
        "building": "710",
        "coord": "[123, 465]",
        "street": "Avenue Road",
        "zipcode": "12345"
    },
    "cuisine": "Chinese",
    "grades": [{
        "date": "2013-03-03T00:00:00.000Z",
        "grade": "B",
        "score": {
          "x": 3,
          "y": 2
        }
    }, {
        "date": "2012-11-23T00:00:00.000Z",
        "grade": "C",
        "score": {
          "x": 1,
          "y": 22
        }
    }],
    "name": "Shash"
}]
}

需要创建的新对象

persons 
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]

persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]

persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]

persons_grade_score
[
{

"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"

},
{

"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"

},
]

我的方法:我使用规范化函数将所有列表变成字典。添加了另一个函数,可以将id 添加到所有嵌套的字典中。

现在我无法遍历每个级别并创建新对象。有没有办法解决这个问题。

创建新对象后的整个想法,我们可以将其加载到数据库中。

【问题讨论】:

  • 为什么在你的结果中,这 2 个等级,大概是同一个人,有不同的 id?
  • @Scoot Hunter 不,他们有相同的 id。抱歉刚刚纠正了它
  • 您的想法是将内容从非关系数据库移动到某个关系数据库的多个表中吗?如果是这样,您应该从正确定义索引开始,以便了解您的目标。像persons_grade_score 应该有persons_grade 的外键,而不是person 的外键

标签: python json dictionary


【解决方案1】:

概念

这是一个通用的解决方案,可以满足您的需求。它使用的概念是递归地遍历顶级“persons”字典的所有值。根据它找到的每个值的类型,它会继续。

因此,对于它在每个字典中找到的所有非字典/非列表,它会将它们放入您需要的顶级对象中。

或者,如果它找到字典或列表,它会再次递归地做同样的事情,找到更多的非字典/非列表或列表或字典。

同时使用 collections.defaultdict 可以让我们轻松地将每个键的未知数量的列表填充到字典中,这样我们就可以获得你想要的那 4 个顶级对象。

代码示例

from collections import defaultdict

class DictFlattener(object):
def __init__(self, object_id_key, object_name):
    """Constructor.

    :param object_id_key: String key that identifies each base object
    :param object_name: String name given to the base object in data.

    """
    self._object_id_key = object_id_key
    self._object_name = object_name

    # Store each of the top-level results lists.
    self._collected_results = None

def parse(self, data):
    """Parse the given nested dictionary data into separate lists.

    Each nested dictionary is transformed into its own list of objects,
    associated with the original object via the object id.

    :param data: Dictionary of data to parse.

    :returns: Single dictionary containing the resulting lists of
        objects, where each key is the object name combined with the
        list name via an underscore.

    """

    self._collected_results = defaultdict(list)

    for value_to_parse in data[self._object_name]:
        object_id = value_to_parse[self._object_id_key]
        parsed_object = {}

        for key, value in value_to_parse.items():
            sub_object_name = self._object_name + "_" + key
            parsed_value = self._parse_value(
                value,
                object_id,
                sub_object_name,
            )
            if parsed_value:
                parsed_object[key] = parsed_value

        self._collected_results[self._object_name].append(parsed_object)

    return self._collected_results

def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
    """Parse some value of an unknown type.

    If it's a list or a dict, keep parsing, otherwise return it as-is.

    :param value_to_parse: Value to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    :returns: None if value_to_parse is a dict or a list, otherwise returns
        value_to_parse.

    """
    if isinstance(value_to_parse, dict):
        self._parse_dict(
            value_to_parse,
            object_id,
            current_object_name,
            index=index,
        )
    elif isinstance(value_to_parse, list):
        self._parse_list(
            value_to_parse,
            object_id,
            current_object_name,
        )
    else:
        return value_to_parse

def _parse_dict(self, dict_to_parse, object_id, current_object_name,
                index=None):
    """Parse some value of a dict type and store it in self._collected_results.

    :param dict_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    parsed_dict = {
        self._object_id_key: object_id,
    }
    if index is not None:
        parsed_dict["__index"] = index

    for key, value in dict_to_parse.items():
        sub_object_name = current_object_name + "_" + key
        parsed_value = self._parse_value(
            value,
            object_id,
            sub_object_name,
            index=index,
        )
        if parsed_value:
            parsed_dict[key] = value

    self._collected_results[current_object_name].append(parsed_dict)

def _parse_list(self, list_to_parse, object_id, current_object_name):
    """Parse some value of a list type and store it in self._collected_results.

    :param list_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    for index, sub_dict in enumerate(list_to_parse):
        self._parse_value(
            sub_dict,
            object_id,
            current_object_name,
            index=index,
        )

然后使用它:

parser = DictFlattener("id", "persons")
results = parser.parse(test_data)

注意事项

  1. 您的示例数据与预期数据存在一些不一致,例如分数是字符串还是整数。因此,当您比较给定值和预期值时,您需要调整这些值。
  2. 总是有更多的重构可以做,或者它可以变得更实用,而不是成为一个类。但希望看到这个可以帮助您了解如何做到这一点。
  3. 正如@jbernardo 所说,如果您要将这些插入到关系数据库中,它们不应该都只有“id”作为键,而应该是“person_id”。

【讨论】:

    【解决方案2】:

    在解析json 像这样Parsing values from a JSON file? 的文件后,这里是帮助你的伪代码

    top_level = []
    for key, val in data['persons']:
        if not (isinstance(val, dict) or isinstance(val, list)):
            top_level.append(key)
    
    all_second_level = []
    for key, val in data['persons']:
        if isinstance(val, dict):
            second_level = []
            for key1, val1 in data['persons']['key']:
                second_level.append(key)
            all_second_level.append(second_level)
        elif isinstance(val, list):
            second_level = []
            for index, item in enumerate(list):
                second_level_entity = []
                for key1, val1 in item:
                    if not isinstance(val1, dict):
                        second_level_entity.append(key1)
                    else:
                        # append it to third level entity
                # append index to the second_level_entity
                second_level.append(second_level_entity)
            all_second_level.append(second_level)
    
    # in the end append id to all items of entities at each level
    

    【讨论】:

      【解决方案3】:
      # create 4 empty lists
      persons = []
      persons_address = []
      persons_grade = []
      persons_grade_score = []
      
      
      # go through all your data and put the correct information in each list
      for data in yourdict['persons']:
          persons.append({
              'id': data['id'],
              'cuisine': data['cuisine'],
              'name': data['name'],
          })
      
          _address = data['address'].copy()
          _address['id'] = data['id']
          persons_address.append(_address)
      
          persons_grade.extend({
              'id': data['id'].
              '__index': n,
              'date': g['date'],
              'grade': g['grade'],
          } for n, g in enumerate(data['grades']))
      
          persons_grade_score.extend({
              'id': data['id'].
              '__index': n,
              'x': g['x'],
              'y': g['y']
          } for n, g in enumerate(data['grades']))
      

      【讨论】:

      • 是否有更通用的方法可以将相同的方法重复用于不同的 json 结构
      猜你喜欢
      • 2021-10-02
      • 1970-01-01
      • 2018-09-30
      • 2017-02-14
      • 2014-05-15
      • 2018-12-29
      • 2017-12-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多