【问题标题】：Pandas pivot_table include empty identitiesPandas pivot_table 包括空身份
【发布时间】：2021-03-01 16:04:59
【问题描述】：

数据集

可视化

网格大小= 8 x 12

数据透视表

X = df.pivot(index='x',columns='y',values='a').values
X[np.isnan(X)] = 0

array([[0., 0., 1., 0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 1., 0., 1., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 1., 1., 1., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1., 0., 0., 0.]])

这里的数据透视表大小是 (8, 8) 但我希望 (8, 12) 将空行排除在数据透视表中。

【问题讨论】：

标签： python arrays pandas numpy pivot

【解决方案1】：

numpy 赋值

x = pd.Categorical(df.x, range(1, 9))
y = pd.Categorical(df.y, range(1, 13))

b = np.zeros((8, 12), int)

b[x.codes, y.codes] = df.a

pd.DataFrame(b, x.categories, y.categories)

   1   2   3   4   5   6   7   8   9   10  11  12
1   0   0   0   0   0   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   0   0   0   0   0
4   0   0   0   0   0   0   0   0   0   0   0   0
5   0   0   1   0   0   0   0   0   0   0   0   0
6   0   0   0   0   0   0   0   0   0   0   0   0
7   0   0   0   0   0   0   0   0   0   0   0   0
8   0   0   0   0   0   0   0   0   0   0   0   0

【讨论】：

IndexError: index 8 is out of bounds for axis 0 with size 8
啊，对。我们可以在df.x 上加一...我的意思是减法>.
必须和df.y做同样的事情
@alex3465 你真的对这个答案感兴趣吗？如果你愿意，我可以多花点力气让它“正确”。

【解决方案2】：

试试reindex:

X = (df.pivot(index='x',columns='y', values='a')
   .fillna(0)
   .reindex(np.arange(12), axis=1, fill_value=0)
   .reindex(np.arange(8), fill_value=0)
)

输出：

y  0    1    2    3    4    5    6    7   8   9   10  11
x                                                       
0   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
1   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
2   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
3   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
4   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
5   0  0.0  0.0  1.0  0.0  0.0  0.0  0.0   0   0   0   0
6   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0
7   0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0

还要考虑set_index().unstack() 而不是pivot：

X = (df.set_index(['x','y'])
       ['a'].unstack(fill_value=0)
       .reindex(np.arange(12), axis=1, fill_value=0)
       .reindex(np.arange(8), fill_value=0)
    )

为您提供更好看的数据：

y  0   1   2   3   4   5   6   7   8   9   10  11
x                                                
0   0   0   0   0   0   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0   0   0   0   0   0
2   0   0   0   0   0   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   0   0   0   0   0
4   0   0   0   0   0   0   0   0   0   0   0   0
5   0   0   0   1   0   0   0   0   0   0   0   0
6   0   0   0   0   0   0   0   0   0   0   0   0
7   0   0   0   0   0   0   0   0   0   0   0   0

【讨论】：

这两种方法都在末尾或开头创建新列和行，而不是在空行或空列

【解决方案3】：

您可以使用所有行和列创建 0 DataFrame，并使用枢轴创建 update。

import pandas as pd

res = pd.DataFrame(index=range(1, 9), columns=range(1, 13), data=0)
res.update(df.pivot('x', 'y', 'a'))

     1    2    3    4    5    6    7   8   9   10  11  12
1  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
2  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
3  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
5  0.0  0.0  1.0  0.0  0.0  0.0  0.0   0   0   0   0    0
6  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
7  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0
8  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0   0   0   0    0

【讨论】：

【解决方案4】：

尝试使用张量矩阵

import torch
tensor_matrix = torch.zeros(8, 12)
tensor_matrix=tensor_matrix.reshape(8,12)

data=[(1,   3,   0),
(1,   1,   0),
(1,   2,   0),
(3,   6,   0),
(5,   3,   1),
(1,   5,   0),
(1,   7,   0),
(1,   6,   0),
(1,   4,   0)]
df=pd.DataFrame(data,columns=['x','y','a'])
for key,row in df.iterrows():
    tensor_matrix[row.x-1,row.y-1]=row.a

df2=pd.DataFrame(np.array(tensor_matrix),columns=[str(i) for i in range(1,13)])

df2.reset_index()
df2.index = df2.index + 1
print (df2)

输出：

     1    2    3    4    5    6    7    8    9   10   11   12
1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
6  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
7  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
8  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

【讨论】：

张量非常高效和快速！