【问题标题】:How to return a dataframe with the last non-NaN values in each column for each month?如何返回每个月每列中最后一个非 NaN 值的数据框?
【发布时间】:2023-03-17 12:54:02
【问题描述】:

我有一个格式如下的数据框:

               A     B     C     D
2020-11-18  64.0  74.0  34.0  57.0
2020-11-20   NaN  71.0   NaN  58.0
2020-11-23   NaN  11.0   NaN   NaN
2020-11-25  69.0   NaN   NaN   0.0
2020-11-27   NaN  37.0  19.0   NaN
2020-11-29  63.0   NaN   NaN  85.0
2020-12-03   NaN  73.0   NaN  49.0
2020-12-10   NaN   NaN  32.0   NaN
2020-12-22  52.0  90.0  33.0  24.0
2020-12-23   NaN  96.0   NaN   NaN
2020-12-28  78.0   NaN   NaN  68.0
2020-12-29  17.0  70.0   NaN  16.0
2021-01-03  51.0  43.0   NaN  66.0

我想获取一个新的数据框,其中包含每列中每个月的最后一个非 NaN 值:

               A     B     C     D
2020-11     63.0  37.0  19.0  85.0
2020-12     17.0  70.0  33.0  16.0

我尝试按月分组并应用返回组内最大索引的 lambda,如下所示:

df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]

产生:

               A     B     C     D
2020-11-29  63.0   NaN   NaN  85.0
2020-12-29  17.0  70.0   NaN  16.0

这将返回每个月最后一天出现的值,但不返回最后一个非 NaN 值。如果特定月份最后一天的值是 NaN,我将在此处显示 NaN。相反,如果该列中该特定月份绝对没有值,我只想显示 NaN 值。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    使用GroupBy.last:

    df = df.groupby(df.index.to_period('M')).last()
    print (df)
                A     B     C     D
    2020-11  63.0  37.0  19.0  85.0
    2020-12  17.0  70.0  33.0  16.0
    2021-01  51.0  43.0   NaN  66.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-10-11
      • 1970-01-01
      • 2020-05-23
      • 1970-01-01
      • 2019-12-07
      • 2018-02-22
      • 1970-01-01
      相关资源
      最近更新 更多