用 cumsum 填充连续的 NaN，在每个连续的 NaN 上加一答案

【问题标题】：Fill consecutive NaNs with cumsum, to increment by one on each consecutive NaN用 cumsum 填充连续的 NaN，在每个连续的 NaN 上加一
【发布时间】：2018-12-27 10:46:42
【问题描述】：

给定一个数据框，在某个反面中具有大量缺失值，我想要的输出数据框应该有所有连续的NaN，从第一个有效值开始填充cumsum，并为每个NaN添加1 .

给定：

shop_id calendar_date quantity
0       2018-12-12      1  
1       2018-12-13      NaN    
2       2018-12-14      NaN    
3       2018-12-15      NaN
4       2018-12-16      1
5       2018-12-17      NaN

期望的输出：

shop_id calendar_date quantity 
0       2018-12-12      1    
1       2018-12-13      2    
2       2018-12-14      3    
3       2018-12-15      4
4       2018-12-16      1
5       2018-12-17      2

【问题讨论】：

标签： pandas dataframe missing-data cumulative-sum

【解决方案1】：

用途：

g = (~df.quantity.isnull()).cumsum()
df['quantity'] = df.fillna(1).groupby(g).quantity.cumsum()

      shop_id calendar_date  quantity
0        0    2018-12-12       1.0
1        1    2018-12-13       2.0
2        2    2018-12-14       3.0
3        3    2018-12-15       4.0
4        4    2018-12-16       1.0
5        5    2018-12-17       2.0

详情

使用.isnull()检查quantity在哪里有有效值，并取布尔系列的cumsum：

g = (~df.quantity.isnull()).cumsum()

0    1
1    1
2    1
3    1
4    2
5    2

使用fillna 这样当您按g 分组并采用cusmum 时，值将从任何值开始增加：

df.fillna(1).groupby(g).quantity.cumsum()
0    1.0
1    2.0
2    3.0
3    4.0
4    1.0
5    2.0

【讨论】：

可以用for循环来完成，但这个答案更优雅！

【解决方案2】：

另一种方法？

数据

   shop_id calender_date  quantity
0        0    2018-12-12       1.0
1        1    2018-12-13       NaN
2        2    2018-12-14       NaN
3        3    2018-12-15       NaN
4        4    2018-12-16       1.0
5        5    2018-12-17       NaN
6        6    2018-12-18       NaN
7        7    2018-12-17       NaN

使用 np.where

where = np.where(data['quantity'] >= 1)

r = []
for i in range(len(where[0])):
    try:
        r.extend(np.arange(1,where[0][i+1] - where[0][i]+1))
    except:
        r.extend(np.arange(1,len(data)-where[0][i]+1))

data['quantity'] = r

打印（数据）

   shop_id calender_date  quantity
0        0    2018-12-12         1
1        1    2018-12-13         2
2        2    2018-12-14         3
3        3    2018-12-15         4
4        4    2018-12-16         1
5        5    2018-12-17         2
6        6    2018-12-18         3
7        7    2018-12-17         4

【讨论】：