【问题标题】:Remove duplicates from list of dictionaries created using groupby itertools in Python从 Python 中使用 groupby itertools 创建的字典列表中删除重复项
【发布时间】:2020-05-03 21:49:21
【问题描述】:

我想删除合并字典中的一些重复项。

我的数据:

mongo_data = [{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Harry Potter', 'value': '10.0'},
               {'key': 'Discovery of Witches', 'value': '8.5'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://goodreads.com/',
 'variables': [{'key': 'Hunger Games', 'value': '10.0'},
               {'key': 'Maze Runner', 'value': '5.5'},],
 'vendor': 'Dystopia' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'Divergent', 'value': '9.0'},
               {'key': 'Lord of the Rings', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 },{
 'url': 'https://kindle.com/',
 'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
               {'key': 'Divergent', 'value': '9.0'},],
 'vendor': 'Fantasy' 
 }]

我的代码:

for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
    search = {"url": key, "results": []}
    for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
        result = {
            "genre": vendor,
            "data": [{'key': key['key'], 'value': key['value']} 
                     for result2 in group2
                     for key in result2["variables"]],
        }
        search["results"].append(result)
    searches.append(search)

我的结果:

[
  {
    "url": "https://goodreads.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Harry Potter",
            "value": "10.0"
          },
          {
            "key": "Discovery of Witches",
            "value": "8.5"
          }
        ]
      },
      {
        "genre": "Dystopia",
        "data": [
          {
            "key": "Hunger Games",
            "value": "10.0"
          },
          {
            "key": "Maze Runner",
            "value": "5.5"
          }
        ]
      }
    ]
  },
  {
    "url": "https://kindle.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Divergent",
            "value": "9.0"
          },
          {
            "key": "Lord of the Rings",
            "value": "9.0"
          },
          {
            "key": "The Handmaids Tale",
            "value": "10.0"
          },
          {
            "key": "Divergent",
            "value": "9.0"
          }
        ]
      }
      }
    ]
  }
]

我不希望我的结构中有任何重复。我不知道如何把它们拿出来。我的预期结果如下所示。

预期结果:

[
  {
    "url": "https://goodreads.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Harry Potter",
            "value": "10.0"
          },
          {
            "key": "Discovery of Witches",
            "value": "8.5"
          }
        ]
      },
      {
        "genre": "Dystopia",
        "data": [
          {
            "key": "Hunger Games",
            "value": "10.0"
          },
          {
            "key": "Maze Runner",
            "value": "5.5"
          }
        ]
      }
    ]
  },
  {
    "url": "https://kindle.com/",
    "results": [
      {
        "genre": "Fantasy",
        "data": [
          {
            "key": "Divergent",
            "value": "9.0"
          },
          {
            "key": "Lord of the Rings",
            "value": "9.0"
          },
          {
            "key": "The Handmaids Tale",
            "value": "10.0"
          }
        ]
      }
      }
    ]
  }
]

Divergent 在最后一个字典列表中重复出现。当我合并我的字典时,即使https://kindle.com/-->Fantasy 中的重复项也合并为一个。有没有办法让我删除重复的字典?

我希望https://kindle.com/ 部分看起来像:

{
"url": "https://kindle.com/",
"results": [
  {
    "genre": "Fantasy",
    "data": [
      {
        "key": "Divergent",
        "value": "9.0"
      },
      {
        "key": "Lord of the Rings",
        "value": "9.0"
      },
      {
        "key": "The Handmaids Tale",
        "value": "10.0"
      }
    ]
  }
  }
]
}

【问题讨论】:

    标签: python python-3.x dictionary arraylist duplicates


    【解决方案1】:

    您可以先尝试将这些dict 转换为tupleset,然后再转换回dictlist

    mongo_data = [{
     'url': 'https://goodreads.com/',
     'variables': [{'key': 'Harry Potter', 'value': '10.0'},
                   {'key': 'Discovery of Witches', 'value': '8.5'},],
     'vendor': 'Fantasy' 
     },{
     'url': 'https://goodreads.com/',
     'variables': [{'key': 'Hunger Games', 'value': '10.0'},
                   {'key': 'Maze Runner', 'value': '5.5'},],
     'vendor': 'Dystopia' 
     },{
     'url': 'https://kindle.com/',
     'variables': [{'key': 'Divergent', 'value': '9.0'},
                   {'key': 'Lord of the Rings', 'value': '9.0'},],
     'vendor': 'Fantasy' 
     },{
     'url': 'https://kindle.com/',
     'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
                   {'key': 'Divergent', 'value': '9.0'},],
     'vendor': 'Fantasy' 
     }]
    from itertools import groupby
    searches = []
    for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
        search = {"url": key, "results": []}
        for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
            result = {
                "genre": vendor,
                "data": set((key['key'], key['value'])
                         for result2 in group2
                         for key in result2["variables"]),
            }
            result['data'] = [{"key": tup[0], "value": tup[1]} for tup in result['data']]
            search["results"].append(result)
        searches.append(search)
    searches
    

    输出:

    [{'results': [{'data': [{'key': 'Harry Potter', 'value': '10.0'},
                            {'key': 'Discovery of Witches', 'value': '8.5'}],
                   'genre': 'Fantasy'},
                  {'data': [{'key': 'Maze Runner', 'value': '5.5'},
                            {'key': 'Hunger Games', 'value': '10.0'}],
                   'genre': 'Dystopia'}],
      'url': 'https://goodreads.com/'},
     {'results': [{'data': [{'key': 'The Handmaids Tale', 'value': '10.0'},
                            {'key': 'Lord of the Rings', 'value': '9.0'},
                            {'key': 'Divergent', 'value': '9.0'}],
                   'genre': 'Fantasy'}],
      'url': 'https://kindle.com/'}]
    

    【讨论】:

      猜你喜欢
      • 2016-03-14
      • 2017-08-13
      • 2012-02-16
      • 2011-10-28
      • 2016-03-13
      • 1970-01-01
      • 2019-01-09
      相关资源
      最近更新 更多