如何在 Python 中比较和提取列表中的某些项目？答案

【问题标题】：How to compare and extract certain items from a list in Python?如何在 Python 中比较和提取列表中的某些项目？
【发布时间】：2018-05-14 10:46:22
【问题描述】：

有一个包含文件信息的列表。

tables = ["20180512, name=file01, size=100",
          "20180512, name=file02, size=90",
          "20180513, name=file01, size=70",
          "20180513, name=file02, size=70",
          "20180513, name=file03, size=80",
          "20180514, name=file01, size=100",
          "20180514, name=file02, size=90"]

我想用每天最大的项目制作一本字典。所以，有了这个列表，字典就是

dic_table = {20180512:file01,
             20180513:file03,
             20180514:file01}

我想我可以使用多个循环和额外的数据结构来做到这一点，但我想知道是否有任何 Python 方法可以有效地完成这项工作。

【问题讨论】：

告诉我们你已经尝试了什么
如果你分享你的初始代码会很棒，这样我们就可以知道从哪里开始:)
这些数据从何而来？感觉这个问题应该在上游解决，而不是自己解析字符串。
@timgeb 它实际上是一个伪列表。实际数据是我使用 walk_nodes 从 hdf5 文件中获得的数据。
@Netwave 我要去，但实际代码取决于其他逻辑并且比这更复杂，所以我担心它可能会冲淡我的问题的目的。不过谢谢你的建议！

标签： python list dictionary

【解决方案1】：

pandas 库非常适合解决这个问题：

首先，通过删除size= 和name= 以及不必要的空格，修改您的数据，使其可以轻松进入数据帧：

import re
import pandas as pd
tables = [re.sub(r'(\w+=|\s+)', '', i).split(',') for i in tables]

# [['20180512', 'file01', '100'],
# ['20180512', 'file02', '90'],
# ['20180513', 'file01', '70'],
# ['20180513', 'file02', '70'],
# ['20180513', 'file03', '80'],
# ['20180514', 'file01', '100'],
# ['20180514', 'file02', '90']]

然后转换为数据框：

df = pd.DataFrame(tables, columns=['Date', 'Name', 'Size'])

#        Date     Name  Size
# 0  20180512   file01   100
# 1  20180512   file02    90
# 2  20180513   file01    70
# 3  20180513   file02    70
# 4  20180513   file03    80
# 5  20180514   file01   100
# 6  20180514   file02    90

最后我们可以使用groupby 和idxmax() 来获取最大值，并使用zip 转换为字典：

df['Size'] = df['Size'].astype(int)
maxes = df.iloc[df.groupby('Date').Size.idxmax()]

#           Date    Name  Size
#    0  20180512  file01   100
#    4  20180513  file03    80
#    5  20180514  file01   100

print(dict(zip(maxes.Date.values, maxes.Name.values)))

#  {'20180512': 'file01', '20180513': 'file03', '20180514': 'file01'}

【讨论】：

感谢您的回答！感谢您的详细解释，我可以轻松地采用您的示例代码来解决问题！

【解决方案2】：

您可以使用标准库中的itertools.groupby。

这个想法是排序、分组然后使用字典理解：

from itertools import groupby
from operator import itemgetter

def tupler(x):
    a = x.split(',')
    b = a[1].split('=')[-1]
    c = a[2].split('=')[-1]
    return int(a[0]), b, int(c)

# sort by date and then by size descending
sorter = sorted(map(tupler, tables), key=lambda x: (x[0], -x[2]))

# group by date
grouper = groupby(sorter, key=itemgetter(0))

# extract first item in groups and remove size from result
res = dict(list(j)[0][:-1] for i, j in grouper)

print(res)

{20180512: 'file01',
 20180513: 'file03',
 20180514: 'file01'}

【讨论】：

感谢您的回答！感谢您，我学会了 itertools 的有用用法！