我如何不使用 iterrows 来解决我的问题？答案

【问题标题】：How do I NOT use iterrows to solve my problem?我如何不使用 iterrows 来解决我的问题？
【发布时间】：2021-08-02 07:27:35
【问题描述】：

我一直在阅读有关如何避免使用 iterrows 遍历 pandas DataFrame 的最佳实践，但我不确定如何解决我的特定问题：

我该怎么做：

在一个 DataFrame df1 中查找值“c”的第一个实例的“时间”，按“num”分组并按“时间”排序
然后根据“num”将该“时间”添加到单独的 DataFrame df2 中。

这是我的输入 DataFrame 的示例：

import pandas as pd

df = pd.DataFrame({'num': [2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 
                           8, 8, 8, 8, 9, 9, 9, 9, 9], 
                   'state': ['a', 'b', 'c', 'b', 'a', 'b', 'c', 'b', 'c', 'b', 'c', 'a', 
                             'b', 'c', 'b', 'c', 'b', 'c', 'a', 'b', 'c', 'b', 'c', 'b', 
                             'c', 'b', 'c', 'b', 'c', 'b'],
                   'time': [234, 239, 244, 249, 100, 105, 110, 115, 120, 125, 130, 3, 8, 
                            13, 18, 23, 28, 33, 551, 556, 561, 566, 571, 576, 581, 45, 50, 
                            55, 60, 65]})

预期输出（df2）：

我尝试的每个解决方案似乎都需要 iterrows 将“时间”加载到 df2 中。

【问题讨论】：

好吧，您实际上并不需要将任何内容加载到df2。您可以从df1 的聚合中获得它，然后如果您需要确保特定行始终或永远不会出现，您可以使用reindex。如果您以人们可以轻松复制和粘贴（或可运行代码）的格式提供数据，您更有可能获得帮助。按原样复制只是很多工作
请查看How to make good pandas examples，我们要求问题包括minimal reproducible example，其中包含您的示例输入和问题文本中的预期输出，而不是图片或链接

标签： python pandas dataframe

【解决方案1】：

你可以一行完成，使用df.groupby()和min()作为聚合函数：

df[df.state == 'c'].drop('state', axis=1).groupby('num').aggregate(min)

【讨论】：

【解决方案2】：

如果不重新创建 df 就很难检查，但我认为应该这样做

def first_c(group):
    filtered = group[group['state'] == 'c'].iloc[0]
    return filtered[['num', 'time']]


df2 = df.groupby('num').apply(first_c)

按数量分组
对 c 应用函数和过滤器，用 iloc 找到第一个整数索引
返回数字和时间

【讨论】：