如果每组缺少一行，则在 pandas / ipython 中为每组添加行答案

【问题标题】：Adding rows per group in pandas / ipython if per group a row is missing如果每组缺少一行，则在 pandas / ipython 中为每组添加行
【发布时间】：2015-09-04 06:49:55
【问题描述】：

我有一个数据框，其中包含每个组在特定时期内的观察次数。 某些组不包含所有句点，对于这些组，我想追加 x 行，其中包含缺失的句点。这样每个组在所有 6 个期间都有一行

我当前的 df 看起来像这样：

> ID      PERIOD       VAlUE
  1       1            10
  1       2            8
  1       3            8  
  1       4            15
  1       5            6
  1       6            44
  2       1            NONE
  3       2            4
  3       5            25

我想要一个像这样的数据框。

> ID      PERIOD       VAlUE
  1       1            10
  1       2            8
  1       3            8  
  1       4            15
  1       5            6
  1       6            44
  2       1            NONE
  2       2            NONE
  2       3            NONE
  2       4            NONE
  2       5            NONE
  2       6            4
  3       1            NONE
  3       2            4
  3       3            NONE
  3       4            NONE
  3       5            25
  3       6            NONE

那么发生了什么：

对于 ID == 1，什么都没有发生，因为它包含所有 6 个句点
对于 ID == 2，它在第一个 df 中没有的每个期间附加了 5 行。
对于 ID == 2，它在第一个 df 中没有的每个期间附加了 4 行。因此，它为周期 1、3、4 和 6 添加了行。

我真的不知道该怎么做，所以非常感谢您的帮助。

【问题讨论】：

标签： pandas append

【解决方案1】：

您可以将索引设置为“ID”和“PERIOD”，然后通过生成两列的乘积来构造一个新索引，并将其作为新索引传递给reindex，这有一个可选的fill_value 参数你可以设置为str NONE:

In [158]:
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
df = df.set_index(['ID','PERIOD'])
df = df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
df

Out[158]:
    ID  PERIOD VAlUE
0    1       1    10
1    1       2     8
2    1       3     8
3    1       4    15
4    1       5     6
5    1       6    44
6    2       1  NONE
7    2       2  NONE
8    2       3  NONE
9    2       4  NONE
10   2       5  NONE
11   2       6  NONE
12   3       1  NONE
13   3       2     4
14   3       3  NONE
15   3       4  NONE
16   3       5    25
17   3       6  NONE

所以分解以上内容：

In [160]:
# create a list of the iterable index values we want to generate all product combinations from
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
iterables

Out[160]:
[array([1, 2, 3], dtype=int64), array([1, 2, 3, 4, 5, 6], dtype=int64)]

In [163]:
# set the index to ID and PERIOD
df = df.set_index(['ID','PERIOD'])
df

Out[163]:
          VAlUE
ID PERIOD      
1  1         10
   2          8
   3          8
   4         15
   5          6
   6         44
2  1       NONE
3  2          4
   5         25

In [164]:
# reindex and pass the product from iterables as the new index
df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
Out[164]:
    ID  PERIOD VAlUE
0    1       1    10
1    1       2     8
2    1       3     8
3    1       4    15
4    1       5     6
5    1       6    44
6    2       1  NONE
7    2       2  NONE
8    2       3  NONE
9    2       4  NONE
10   2       5  NONE
11   2       6  NONE
12   3       1  NONE
13   3       2     4
14   3       3  NONE
15   3       4  NONE
16   3       5    25
17   3       6  NONE

【讨论】：

谢谢，尤其是故障！

【解决方案2】：

您可以在PERIOD 上取消堆叠结果，然后将dropna 选项设置为False 将它们堆叠回来。

>>> df.groupby(['ID', 'PERIOD']).VAlUE.sum().unstack('PERIOD').stack('PERIOD', dropna=False)
ID  PERIOD
1   1          10
    2           8
    3           8
    4          15
    5           6
    6          44
2   1         NaN
    2         NaN
    3         NaN
    4         NaN
    5         NaN
    6         NaN
3   1         NaN
    2           4
    3         NaN
    4         NaN
    5          25
    6         NaN
dtype: object

【讨论】：

不错的方法，取决于 OP 的 NONE 是否真的是字符串 NONE 或 NaN 您可以调用 fillna('NONE')，也需要 reset_index()