【问题标题】:Python. Manipulation with a list of dictionariesPython。使用字典列表进行操作
【发布时间】:2012-12-28 06:53:07
【问题描述】:

朋友们,我有一个字典列表:

my_list = 
[
{'oranges':'big','apples':'green'},
{'oranges':'big','apples':'green','bananas':'fresh'},
{'oranges':'big','apples':'red'},
{'oranges':'big','apples':'green','bananas':'rotten'}
]

我想创建一个新列表,消除部分重复。

在我的例子中,这本字典必须被删除:

{'oranges':'big','apples':'green'}

,因为它复制了更长的字典:

{'oranges':'big','apples':'green','bananas':'fresh'}
{'oranges':'big','apples':'green','bananas':'rotten'}

因此,期望的结果:

[
{'oranges':'big','apples':'green','bananas':'fresh'},
{'oranges':'big','apples':'red'},
{'oranges':'big','apples':'green','bananas':'rotten'}
]

怎么做?谢谢一百万!

【问题讨论】:

  • 你的意思是,如果较短的字典是较长字典的子集,那么将其过滤掉,对吗?
  • 第一步是决定如何将某些内容标记为部分重复。只是密钥对出现了不止一次吗?
  • @Shawn。是的先生。完全正确!
  • @BurhanKhalid 在字典中,键值对是唯一的。在两个字典之间,是的,有重复。
  • 我们可以假设键是静态的吗?意思是你知道所有可能的键?

标签: python list dictionary


【解决方案1】:

我想到的第一个 [嗯,第二个,有一些编辑..] 是这样的:

def get_superdicts(dictlist):
    superdicts = []
    for d in sorted(dictlist, key=len, reverse=True):
        fd = set(d.items())
        if not any(fd <= k for k in superdicts):
            superdicts.append(fd)
    new_dlist = map(dict, superdicts)
    return new_dlist

给出:

>>> a = [{'apples': 'green', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'red', 'oranges': 'big'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}]
>>> 
>>> get_superdicts(a)
[{'apples': 'red', 'oranges': 'big'}, 
 {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, 
 {'bananas': 'fresh', 'oranges': 'big', 'apples': 'green'}]

[最初我在这里使用frozenset,以为我可以做一些巧妙的设置操作,但显然没有想出任何东西。]

【讨论】:

  • 您可以将fd.issubset(k) 替换为fd &lt;= k
  • @Blender:好点,已编辑。仍然觉得应该有一些基于集合的技巧。
【解决方案2】:

尝试以下实现

请注意,在我的实现中,我只对 2 对组合进行预排序并选择以减少迭代次数。 这将确保密钥的大小始终小于或等于干草

>>> my_list =[
{'oranges':'big','apples':'green'},
{'oranges':'big','apples':'green','bananas':'fresh'},
{'oranges':'big','apples':'red'},
{'oranges':'big','apples':'green','bananas':'rotten'}
]

#Create a function remove_dup, name it anything you want
def remove_dup(lst):
    #import combinations for itertools, mainly to avoid multiple nested loops
    from itertools import combinations
    #Create a generator function dup_gen, name it anything you want
    def dup_gen(lst):
        #Now read the dict pairs, remember key is always shorter than hay in length
        for key, hay in combinations(lst, 2):
            #if key is in hay then set(key) - set(hay) = empty set
            if not set(key) - set(hay):
                #and if key is in hay, yield it
                yield key
    #sort the list of dict based on lengths after converting to a item tuple pairs
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case
    #remove_dup([{1:2}, {1:2}]) == []
    lst = sorted(set(tuple(e.items()) for e in lst), key = len)
    #Now recreate the dictionary from the set difference of
    #the original list and the elements generated by dup_gen
    #Elements generated by dup_gen are the duplicates that needs to be removed
    return [dict(e) for e in set(lst) - set(dup_gen(lst))]

remove_dup(my_list)
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}, {'apples': 'red', 'oranges': 'big'}]

remove_dup([{1:2}, {1:2}])
[{1: 2}]

remove_dup([{1:2}])
[{1: 2}]

remove_dup([])
[]

remove_dup([{1:2}, {1:3}])
[{1: 2}, {1: 3}]

更快的实现

def remove_dup(lst):
    #sort the list of dict based on lengths after converting to a item tuple pairs
    #Handle duplicate elements, thanks to DSM for pointing out this boundary case
    #remove_dup([{1:2}, {1:2}]) == []
    lst = sorted(set(tuple(e.items()) for e in lst), key = len)
        #Generate all the duplicates
    dups = (key for key, hay in combinations(lst, 2) if not set(key).difference(hay))
    #Now recreate the dictionary from the set difference of
    #the original list and the duplicate elements
    return [dict(e) for e in set(lst).difference(dups)]

【讨论】:

  • @MostafaR: {'a': 'b', 'a': 'b'} 实际上是 {'a': 'b'} 并且根据集合论,集合是其自身的子集
  • @MostafaR: {'a': 'b', 'a': 'b'} == {'a': 'b'}.
  • @TorroBuden:如果答案有帮助,请尝试投票并接受答案
  • 我认为这段代码有时会删除太多的字典:例如remove_dup([{1:2}]) == [{1: 2}]remove_dup([{1:2}, {1:2}]) == []。 [不过,这是一个角落案例,很容易修复。我提到它只是因为我在比较每个人的答案以寻找错误并且它断言。]
  • @Abhijit 最后一个如何更快?你仍然有一个类似的 O(N^3) 实现。
【解决方案3】:

这是您可以使用的一种实现方式:-

>>> my_list = [
{'oranges':'big','apples':'green'},
{'oranges':'big','apples':'green','bananas':'fresh'},
{'oranges':'big','apples':'red'},
{'oranges':'big','apples':'green','bananas':'rotten'}
]

>>> def is_subset(d1, d2):
        return all(item in d2.items() for item in d1.items())
        # or
        # return set(d1.items()).issubset(set(d2.items()))

>>> [d for d in my_list if not any(is_subset(d, d1) for d1 in my_list if d1 != d)]
[{'apples': 'green', 'oranges': 'big', 'bananas': 'fresh'}, 
 {'apples': 'red', 'oranges': 'big'}, 
 {'apples': 'green', 'oranges': 'big', 'bananas': 'rotten'}]

对于my_list 中的每个字典d:-

any(is_subset(d, d1) for d1 in my_list if d1 != d)

检查它是否是my_list 中任何其他dict 的子集。如果它返回True,那么至少有一个字典,它的子集是d。因此,我们取其中的一个 not 以从列表中排除 d

【讨论】:

    【解决方案4】:

    简答

    def is_subset(d1, d2):
        # Check if d1 is subset of d2
        return all(item in d2.items() for item in d1.items())
    
    filter(lambda x: len(filter(lambda y: is_subset(x, y), my_list)) == 1, my_list)
    

    【讨论】:

    • 这真是太聪明了,你到底是怎么想出来的?
    • 你的答案和 Rohit 的差别不大,只是你被多个过滤器掩盖了
    【解决方案5】:

    我认为它有更好的时间顺序:

    def is_subset(a, b):
        return not set(a) - set(b)
    
    def remove_extra(my_list):
        my_list = [d.items() for d in my_list]
        my_list.sort()
    
        result = []
        for i in range(len(my_list) - 1):
            if not is_subset(my_list[i], my_list[i + 1]):
                result.append(dict(my_list[i]))
        result.append(dict(my_list[-1]))
    
        return result
    
    print remove_extra([
            {'oranges':'big','apples':'green'},
            {'oranges':'big','apples':'green','bananas':'fresh'},
            {'oranges':'big','apples':'red'},
            {'oranges':'big','apples':'green','bananas':'rotten'}
        ])
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-05-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-01-29
      相关资源
      最近更新 更多