【问题标题】:Python unique values in a list列表中的 Python 唯一值
【发布时间】:2013-11-19 02:27:25
【问题描述】:

我是 Python 新手,我发现 set() 有点令人困惑。有人可以提供一些帮助来查找和创建新的唯一数字列表(换句话说,消除重复)?

import string
import re

def go():
        import re
        file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
        filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
        with open(filename, 'r') as f:
                lines = f.read()

                found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
                a = found
                for i in range(5):
                         a[i] = str(found[i])
                         print(a[i].split('x'))

现在

print(a[i].split('x')) 

....给出以下输出

['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']

['2897', '514081', '585530047', '108785617538783538760452408483163']

['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']

['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']

['2', '164493637239099960712719840940483950285726027116731']

如何输出仅包含非重复数字的列表?我在论坛上读到“set()”可以做到这一点,但我试过这个没有用。非常感谢任何帮助!

【问题讨论】:

  • 我不确定我是否理解。您显示的所有列表都没有内部重复值。您是否担心其他一些值中存在重复项,但是(巧合)您显示的前五个值中没有重复项?或者您是否需要消除列表之间的重复项,使2 只出现在第一个列表中,而不出现在第三个或第五个列表中?
  • 很抱歉,它已经深夜了,我的意思是“如果我将所有列表连接在一起,则没有重复值”

标签: python list python-3.x unique


【解决方案1】:

set 是一个集合(如listtuple),但它不允许重复并且具有非常快速的成员资格测试。您可以使用列表推导过滤掉一个列表中出现在前一个列表中的值:

data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
        ['2897', '514081', '585530047', '108785617538783538760452408483163'],
        ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
        ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
        ['2', '164493637239099960712719840940483950285726027116731']]

seen = set() # set of seen values, which starts out empty

for lst in data:
    deduped = [x for x in lst if x not in seen] # filter out previously seen values
    seen.update(deduped)                        # add the new values to the set

    print(deduped)                              # do whatever with deduped list

输出:

['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']

请注意,此版本不会过滤掉在单个列表中重复的值(除非它们已经与之前列表中的值重复)。您可以通过用显式循环替换列表推导来解决此问题,该循环在附加到输出列表之前检查每个单独的值是否与 seen 集合(如果它是新的,则为 adds)。或者,如果子列表中项目的顺序不重要,您可以将它们变成自己的集合:

seen = set()
for lst in data:
    lst_as_set = set(lst)               # this step eliminates internal duplicates
    deduped_set = lst_as_set - seen     # set subtraction!
    seen.update(deduped_set)

    # now do stuff with deduped_set, which is iterable, but in an arbitrary order

最后,如果内部子列表完全是一条红鲱鱼,并且您只想过滤扁平列表以仅获取唯一值,那么这听起来像是 itertools documentation 中的 unique_everseen 配方的工作:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

【讨论】:

    【解决方案2】:

    set 在这种情况下应该可以工作。

    您可以尝试以下方法:

    # Concat all your lists into a single list
    >>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
    >>> len(a)
    29
    >>> set(a)
    set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
    0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
    3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])
    
    >>> len(set(a))
    24
    >>> 
    

    【讨论】:

    • 供将来参考。 concat_list = list1 + list2 + list3 + ... + listn
    【解决方案3】:

    如果您想要扁平化列表中的唯一值,可以使用 reduce() 扁平化列表。然后使用frozenset()构造函数获取结果列表:

    >>> data = [
       ['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
       ['2897', '514081', '585530047', '108785617538783538760452408483163'],
       ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
       ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
       ['2', '164493637239099960712719840940483950285726027116731']]
    
    >>> print list(frozenset(reduce((lambda a, b: a+b), data)))
    ['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
    '2', '5', '108785617538783538760452408483163', '22798192180727861167',
    '164493637239099960712719840940483950285726027116731', '8337580729', 
    '4947999059', '19', '2897', '7511070764480753', '53', '28087', 
    '2182718359336613102811898933144207', '1451', '31159',
    '1932261797039146667', '293']
    

    【讨论】:

      猜你喜欢
      • 2019-04-22
      • 1970-01-01
      • 1970-01-01
      • 2020-11-27
      • 1970-01-01
      • 2023-04-10
      • 2012-01-15
      • 2011-04-13
      • 1970-01-01
      相关资源
      最近更新 更多