【问题标题】：Find intersection of two nested lists?查找两个嵌套列表的交集？
【发布时间】：2010-10-13 04:01:39
【问题描述】：

我知道如何获得两个平面列表的交集：

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

或

def intersect(a, b):
    return list(set(a) & set(b))
 
print intersect(b1, b2)

但是当我必须找到嵌套列表的交集时，我的问题就开始了：

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

最后我想收到：

c3 = [[13,32],[7,13,28],[1,6]]

你们能帮我解决这个问题吗？

查找可迭代对象之间差异和交集的简单方法

如果重复很重要，请使用此方法

from collections import Counter

def intersection(a, b):
    """
    Find the intersection of two iterables

    >>> intersection((1,2,3), (2,3,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,3,4))
    (2, 3, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)
    """
    return tuple(n for n, count in (Counter(a) & Counter(b)).items() for _ in range(count))

def difference(a, b):
    """
    Find the symmetric difference of two iterables

    >>> difference((1,2,3), (2,3,4))
    (1, 4)

    >>> difference((1,2,3,3), (2,3,4))
    (1, 3, 4)

    >>> difference((1,2,3,3), (2,3,4,4))
    (1, 3, 4, 4)
    """
    diff = lambda x, y: tuple(n for n, count in (Counter(x) - Counter(y)).items() for _ in range(count))
    return diff(a, b) + diff(b, a)

【讨论】：

【解决方案3】：

& 运算符取两个集合的交集。

{1, 2, 3} & {2, 3, 4}
Out[1]: {2, 3}

【讨论】：

很好，但是这个主题是针对列表的！
两个列表相交的结果是一个集合，所以这个答案是完全有效的。
列表可以包含重复值，但集合不能。

【解决方案4】：

可以通过`reduce`轻松制作平面列表。

所有你需要使用 initializer - reduce 函数中的第三个参数。

reduce(
   lambda result, _list: result.append(
       list(set(_list)&set(c1)) 
     ) or result, 
   c2, 
   [])

以上代码适用于 python2 和 python3，但您需要将 reduce 模块导入为 from functools import reduce。详情请参考以下链接。

【讨论】：

【解决方案5】：

获取两个列表的交集的pythonic方法是：

[x for x in list1 if x in list2]

【讨论】：

这个问题是关于嵌套列表的。你的回答没有回答问题。

【解决方案6】：

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [list(set(i) & set(c1)) for i in c2]
c3
[[32, 13], [28, 13, 7], [1, 6]]

对我来说，这是一种非常优雅和快速的方法:)

【讨论】：

【解决方案7】：

给定：

> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

我发现以下代码运行良好，如果使用 set 操作可能更简洁：

> c3 = [list(set(f)&set(c1)) for f in c2]

得到了：

> [[32, 13], [28, 13, 7], [1, 6]]

如果需要订购：

> c3 = [sorted(list(set(f)&set(c1))) for f in c2]

我们得到了：

> [[13, 32], [7, 13, 28], [1, 6]]

顺便说一句，对于更 Python 的风格，这个也不错：

> c3 = [ [i for i in set(f) if i in c1] for f in c2]

【讨论】：

【解决方案8】：

# Problem:  Given c1 and c2:
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
# how do you get c3 to be [[13, 32], [7, 13, 28], [1, 6]] ?

这是一种不涉及集合的设置c3 的方法：

c3 = []
for sublist in c2:
    c3.append([val for val in c1 if val in sublist])

但如果你喜欢只使用一行，你可以这样做：

c3 = [[val for val in c1 if val in sublist]  for sublist in c2]

这是列表推导中的列表推导，这有点不寻常，但我认为你应该不会有太多麻烦。

【讨论】：

【解决方案9】：

要定义正确考虑元素基数的交集，请使用Counter：

from collections import Counter

>>> c1 = [1, 2, 2, 3, 4, 4, 4]
>>> c2 = [1, 2, 4, 4, 4, 4, 5]
>>> list((Counter(c1) & Counter(c2)).elements())
[1, 2, 4, 4, 4]

【讨论】：

【解决方案10】：

我也一直在寻找一种方法，最终结果是这样的：

def compareLists(a,b):
    removed = [x for x in a if x not in b]
    added = [x for x in b if x not in a]
    overlap = [x for x in a if x in b]
    return [removed,added,overlap]

【讨论】：

如果不使用 set.intersection 那么这些简单的单线也是我会做的。

【解决方案11】：

函数式方法：

input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]

result = reduce(set.intersection, map(set, input_list))

它可以应用于更一般的 1+ 列表的情况

【讨论】：

允许空输入列表：set(*input_list[:1]).intersection(*input_list[1:])。迭代器版本 (it = iter(input_list))：reduce(set.intersection, it, set(next(it, [])))。两个版本都不需要将所有输入列表转换为设置。后者的内存效率更高。
使用 from functools import reduce 在 Python 3 中使用它。或者更好的是，使用显式的 for 循环。

【解决方案12】：

如果你愿意：

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

那么这里是 Python 2 的解决方案：

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

在 Python 3 中，filter 返回一个可迭代对象而不是 list，因此您需要使用 list() 包装 filter 调用：

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

说明：

过滤器部分获取每个子列表的项目并检查它是否在源列表 c1 中。对 c2 中的每个子列表执行列表推导。

【讨论】：

你可以使用filter(set(c1).__contains__, sublist)来提高效率。顺便说一句，这个解决方案的优点是filter() 保留了字符串和元组类型。
我喜欢这种方法，但我的结果列表中出现了空白 ''
我在这里添加了 Python 3 compat，因为我将它用作 Python 3 问题的欺骗目标
这通过嵌套理解更好地阅读 IMO：c3 = [[x for x in sublist if x in c1] for sublist in c2]

【解决方案13】：

我们可以为此使用 set 方法：

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

   result = [] 
   for li in c2:
       res = set(li) & set(c1)
       result.append(list(res))

   print result

【讨论】：

【解决方案14】：

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

c3 = [list(set(c2[i]).intersection(set(c1))) for i in xrange(len(c2))]

c3
->[[32, 13], [28, 13, 7], [1, 6]]

【讨论】：

【解决方案15】：

您不需要定义交集。它已经是套装的一流部分了。

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> set(b1).intersection(b2)
set([4, 5])

【讨论】：

会因为转换成set而比lambda慢吗？
@S.Lott，set(b1) & set(b2) 有什么问题吗？ IMO 使用运算符更清洁。
另外，使用set 将导致代码速度提高几个数量级。这是一个示例基准®：gist.github.com/andersonvom/4d7e551b4c0418de3160
仅在不需要订购结果时才有效。
所以......这个答案并不能回答这个问题，对吧？因为现在这确实适用于嵌套列表。

【解决方案16】：

我不知道我是否迟到了回答你的问题。在阅读了您的问题后，我想出了一个可以在列表和嵌套列表上工作的函数 intersect()。我用递归来定义这个函数，很直观。希望它是您正在寻找的：

def intersect(a, b):
    result=[]
    for i in b:
        if isinstance(i,list):
            result.append(intersect(a,i))
        else:
            if i in a:
                 result.append(i)
    return result

例子：

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> print intersect(c1,c2)
[[13, 32], [7, 13, 28], [1, 6]]

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> print intersect(b1,b2)
[4, 5]

【讨论】：

【解决方案17】：

既然定义了intersect，一个基本的列表理解就足够了：

>>> c3 = [intersect(c1, i) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

感谢 S. Lott 的评论和 TM. 的相关评论：

>>> c3 = [list(set(c1).intersection(i)) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

【讨论】：

【解决方案18】：

对于只想找到两个列表的交集的人，Asker 提供了两种方法：

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

和

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(b1, b2)

但是有一种更高效的混合方法，因为您只需要在列表/集合之间进行一次转换，而不是三个：

b1 = [1,2,3,4,5]
b2 = [3,4,5,6]
s2 = set(b2)
b3 = [val for val in b1 if val in s2]

这将在 O(n) 中运行，而他涉及列表理解的原始方法将在 O(n^2) 中运行

【讨论】：

由于“if val in s2”的运行时间为O(N)，所以建议的代码sn-p复杂度也是O(n^2)
根据wiki.python.org/moin/TimeComplexity#set，“val in s2”的平均情况为 O(1) - 因此在 n 次操作中，预期时间为 O(n)（最坏情况时间是否为 O( n) 或 O(n^2) 取决于此平均情况是否代表摊销时间，但这在实践中并不是很重要）。
运行时间是 O(N) 不是因为它被摊销了而是因为集合成员是平均 O(1) （例如使用哈希表时），这是很大的差异，例如因为摊销时间是有保证的。

【解决方案19】：

纯列表理解版

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> c1set = frozenset(c1)

扁平化变体：

>>> [n for lst in c2 for n in lst if n in c1set]
[13, 32, 7, 13, 28, 1, 6]

嵌套变体：

>>> [[n for n in lst if n in c1set] for lst in c2]
[[13, 32], [7, 13, 28], [1, 6]]

【讨论】：

【解决方案20】：

您应该使用此代码（取自http://kogs-www.informatik.uni-hamburg.de/~meine/python_tricks）进行展平，该代码未经测试，但我很确定它可以工作：


def flatten(x):
    """flatten(sequence) -> list

    Returns a single, flat list which contains all elements retrieved
    from the sequence and all recursively contained sub-sequences
    (iterables).

    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""

    result = []
    for el in x:
        #if isinstance(el, (list, tuple)):
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

将列表展平后，以通常的方式执行交集：


c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(flatten(c1), flatten(c2))

【讨论】：

这是一个很好的扁平化代码 Geo，但它没有回答问题。提问者特别希望得到 [[13,32],[7,13,28],[1,6]] 形式的结果。

【解决方案21】：

您认为[1,2] 与[1, [2]] 相交吗？也就是说，你关心的只是数字，还是列表结构？

如果只有数字，请研究如何“扁平化”列表，然后使用set() 方法。

【讨论】：

我想保持列表结构不变。

相关

查找可迭代对象之间差异和交集的简单方法

可以通过reduce轻松制作平面列表。

纯列表理解版

可以通过`reduce`轻松制作平面列表。