【问题标题】:Efficient way to apply conditional function to data grouped by day in Pandas在 Pandas 中将条件函数应用于按天分组的数据的有效方法
【发布时间】:2020-04-10 21:08:13
【问题描述】:

我想对每天分组的数据应用条件函数:对于每天有一半以上的值等于0的每一列,将当天列的所有值设置为np.nan

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,0,0
2016-01-02 10:00:00,0,0
2016-01-02 11:00:00,0,0
2016-01-02 12:00:00,0,0
2016-01-02 13:00:00,1,0
2016-01-02 14:00:00,0,0
2016-01-02 15:00:00,0,0
2016-01-02 16:00:00,0,0
2016-01-02 17:00:00,0,0
2016-01-02 18:00:00,0,0
2016-01-02 19:00:00,0,0
2016-01-02 20:00:00,1,0
2016-01-02 21:00:00,0,0
2016-01-02 22:00:00,0,0
2016-01-02 23:00:00,0,0

期望的输出:

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,null,null
2016-01-02 10:00:00,null,null
2016-01-02 11:00:00,null,null
2016-01-02 12:00:00,null,null
2016-01-02 13:00:00,null,null
2016-01-02 14:00:00,null,null
2016-01-02 15:00:00,null,null
2016-01-02 16:00:00,null,null
2016-01-02 17:00:00,null,null
2016-01-02 18:00:00,null,null
2016-01-02 19:00:00,null,null
2016-01-02 20:00:00,null,null
2016-01-02 21:00:00,null,null
2016-01-02 22:00:00,null,null
2016-01-02 23:00:00,null,null

我已阅读此问题:pandas apply function to data grouped by day 并尝试关注:

df_mode = df.groupby(df.index.date).apply(lambda x: mode(x)[0])

我在每一列中获得了每天出现频率最高的值。但是我不知道如何处理下一步(将当天列中的所有值设置为np.nan

在这种情况下,还有比使用apply 更有效的方法吗?

谢谢

【问题讨论】:

    标签: python pandas dataframe time-series


    【解决方案1】:

    使用GroupBy.transform0mean 比较值作为百分比,然后通过DataFrame.mask 设置最小值:

    df = df.mask(df.eq(0).groupby(df.index.date).transform('mean').gt(.5))
    print (df)
                         value1  value2
    date                               
    2016-01-01 09:00:00    14.0    14.0
    2016-01-01 10:00:00    12.0    13.0
    2016-01-01 11:00:00    11.0    13.0
    2016-01-01 12:00:00    11.0     9.0
    2016-01-01 13:00:00    17.0    21.0
    2016-01-01 14:00:00     9.0    22.0
    2016-01-01 15:00:00    10.0     9.0
    2016-01-01 16:00:00    11.0     9.0
    2016-01-01 17:00:00     8.0     8.0
    2016-01-01 18:00:00     4.0     2.0
    2016-01-01 19:00:00     5.0     7.0
    2016-01-01 20:00:00     5.0     5.0
    2016-01-01 21:00:00     3.0     4.0
    2016-01-01 22:00:00     2.0     4.0
    2016-01-01 23:00:00     2.0     4.0
    2016-01-02 09:00:00     NaN     NaN
    2016-01-02 10:00:00     NaN     NaN
    2016-01-02 11:00:00     NaN     NaN
    2016-01-02 12:00:00     NaN     NaN
    2016-01-02 13:00:00     NaN     NaN
    2016-01-02 14:00:00     NaN     NaN
    2016-01-02 15:00:00     NaN     NaN
    2016-01-02 16:00:00     NaN     NaN
    2016-01-02 17:00:00     NaN     NaN
    2016-01-02 18:00:00     NaN     NaN
    2016-01-02 19:00:00     NaN     NaN
    2016-01-02 20:00:00     NaN     NaN
    2016-01-02 21:00:00     NaN     NaN
    2016-01-02 22:00:00     NaN     NaN
    2016-01-02 23:00:00     NaN     NaN
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-07-28
      • 2017-09-23
      • 1970-01-01
      • 2021-01-11
      • 1970-01-01
      • 2021-01-29
      • 2020-06-08
      相关资源
      最近更新 更多