在 Python Pandas 中使用 groupby 将一列的每个值与另一列的每个值分组答案

【问题标题】：To Group each value of one column with every value of other column using groupby in Python Pandas在 Python Pandas 中使用 groupby 将一列的每个值与另一列的每个值分组
【发布时间】：2018-03-21 13:08:00
【问题描述】：

所以，我有一个包含 3 列的数据框，每列有 631 行，所以我只突出显示每列下的唯一值。

df

Segment Type  Nature of Query     Q1

PRIME         Request             1           
BUSINESS      Complaint           2 
PRIORITY      Critical Request    3
                                  4
                                  5

现在，假设在“Segment Type”下，我想将“PRIME”与“NATURE OF QUERY”和“Q1”的每一行分组并找到大小、最小值、最大值、平均值

所以尝试使用 groupby func，我得到了这个：

 df.groupby(['Segment Type','Nature of Query'])['Q1'].agg([pd.np.size, 
 pd.np.min, pd.np.max, pd.np.mean])

而且，我得到了这个：

    Segment Type    Nature of Query    size     amin    amax    mean            

         BUSINESS       Request          1        4       4     4.000000
           PRIME        Complaint        1        5       5     5.000000
                      Critical Request   3        1       2     1.666667
                        Request          31       1       5     3.387097
          PRIORITY    Critical Request   1        4       4     4.000000
                        Request          3        3       5     4.000000

我想要的输出：

       Segment Type   Nature of Query      size     amin    amax    mean
           BUSINESS       Request            1        4       4     4.000000
                          Complaint          1        5       5     5.000000
                          Critical Request   3        1       2     1.666667


            PRIME       Complaint            1        5       5     5.000000
                        Critical Request     3        1       2     1.666667
                        Request              31       1       5     3.387097

          PRIORITY      Complaint            1        5       5     5.000000
                        Critical Request     1        4       4     4.000000
                        Request              3        3       5     4.000000

忽略第一季度计算的大小、平均值、最大值等。我的主要问题是“Segment Type”和“Nature of Query”的值。

如果有任何可能的解决方案，请告诉我。谢谢！

【问题讨论】：

您确定您使用的数据支持您想要的输出吗？
应该。

标签： python pandas pandas-groupby

【解决方案1】：

我相信需要reindex创建的MultiIndex.from_product：

df = df.groupby(['Segment Type','Nature of Query'])['Q1'].agg(['size', 'min', 'max', 'mean'])

mux = pd.MultiIndex.from_product(df.index.levels, names=['Segment Type','Nature of Query'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
  Segment Type   Nature of Query  size  min  max  mean
0     BUSINESS         Complaint     1    2    2     2
1     BUSINESS  Critical Request     0    0    0     0
2     BUSINESS           Request     0    0    0     0
3        PRIME         Complaint     0    0    0     0
4        PRIME  Critical Request     0    0    0     0
5        PRIME           Request     1    1    1     1
6     PRIORITY         Complaint     0    0    0     0
7     PRIORITY  Critical Request     3    3    5     4
8     PRIORITY           Request     0    0    0     0

【讨论】：

成功了！谢谢！

【解决方案2】：

您可以使用数据透视表功能，请参阅此处的教程：

http://pbpython.com/pandas-pivot-table-explained.html

【讨论】：

会看到的。谢谢！