如何在数据框上使用 groupby答案

【问题标题】：How to use groupby on a dataframe如何在数据框上使用 groupby
【发布时间】：2021-12-03 22:35:22
【问题描述】：

我有一个数据框（调查），我需要在其中对 2 列进行分组。两列之一是排名（5 个选项：非常差、差、平均、好和优秀），第二列是时间列表。我需要像这样对这两列进行分组：

raking    |   Time   |  Count of how many times the time appears on the column "time" for a raking  
-------------------------------------
Very poor |  0.0     |   6
          |  1.0     |   2    
          |  2.0     |   9             
-------------------------------------                              
Poor      |  0.0     |   3                           
          |  1.0     |   12                          
...

我需要在 5 个图表中显示这些表格的结果（每个 rake 一个），其中 x=Time 和 Y=Count

我已经被困了几个小时了，有人可以帮忙吗？？？

【问题讨论】：

请使用df.to_dict()提供您的原始数据框样本
{'ID': {0: 'R1', 1: 'R2', 2: 'R3', 3: 'R4'}, '居住地区': {0: '德里-NCR', 1: 'Delhi-NCR', 2: 'Delhi-NCR', 3: 'Delhi-NCR'}, '受试者年龄': {0: 21, 1: 21, 2: 20, 3: 20}, '在线上课时间': {0: 2.0, 1: 0.0, 2: 7.0, 3: 3.0}, '在线课堂体验评分': {0: '好', 1: '优秀', 2: '非常差', 3: '非常差'}, '自学时间': {0: 4.0, 1: 0.0, 2: 3.0, 3: 2.0}, } 这是一个样本，我需要groupby 在线课程的评分与自学时间

标签： python pandas dataframe graph

【解决方案1】：

设置MRE:

rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking':  np.random.choice(rank, 100),
                   'Time': np.random.randint(1, 50, 100)})
print(df)

# Output:
      Ranking  Time
0   Excellent    28
1        Poor    33
2   Excellent    28
3     Average    22
4   Very Poor    11
..        ...   ...
95  Very Poor    13
96    Average    26
97  Very Poor    23
98       Good    24
99       Good    36

[100 rows x 2 columns]

使用value_counts 来计算（排名、时间）而不是groupby：

count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)

# Output:
      Ranking  Time  Count
0        Poor    41      3
1   Very Poor    46      3
2   Very Poor    49      2
3   Very Poor    17      2
4   Excellent    20      2
..        ...   ...    ...
81  Excellent    34      1
82  Excellent    32      1
83  Excellent    27      1
84  Excellent    26      1
85       Good    32      1

[86 rows x 3 columns]

要可视化数据，最简单的方法是使用seaborn 和displot：

# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()

【讨论】：

成功了！非常感谢您的帮助！！
完成！再次感谢！！