【发布时间】:2021-03-03 13:23:33
【问题描述】:
我有 4 个组(研究、销售、manu、hr),每个组有 2 个类别(0 和 1)。我正在尝试绘制列表ratings 中特征中每个组的平均分数。给我方法的代码看起来像这样(depts = ['research', 'sales', 'manu', 'hr']:
ratings = ['JobSatisfaction', 'PerformanceRating', 'EnvironmentSatisfaction', 'RelationshipSatisfaction']
for i in depts:
for x in ratings:
print(group_data.groupby([i]).mean()[x])
这会导致这个输出:
research
0.0 2.700000
1.0 2.773973
Name: JobSatisfaction, dtype: float64
research
0.0 3.100000
1.0 3.167808
Name: PerformanceRating, dtype: float64
research
0.0 2.500000
1.0 2.726027
Name: EnvironmentSatisfaction, dtype: float64
research
0.0 2.687500
1.0 2.705479
Name: RelationshipSatisfaction, dtype: float64
sales
0.0 2.754601
1.0 2.734940
Name: JobSatisfaction, dtype: float64
sales
0.0 3.125767
1.0 3.144578
Name: PerformanceRating, dtype: float64
sales
0.0 2.671779
1.0 2.734940
Name: EnvironmentSatisfaction, dtype: float64
sales
0.0 2.702454
1.0 2.602410
Name: RelationshipSatisfaction, dtype: float64
manu
0.0 2.682759
1.0 2.723077
Name: JobSatisfaction, dtype: float64
manu
0.0 3.186207
1.0 3.158974
Name: PerformanceRating, dtype: float64
manu
0.0 2.917241
1.0 2.735897
Name: EnvironmentSatisfaction, dtype: float64
manu
0.0 2.724138
1.0 2.689744
Name: RelationshipSatisfaction, dtype: float64
hr
0.0 2.705882
1.0 2.557692
Name: JobSatisfaction, dtype: float64
hr
0.0 3.196078
1.0 3.134615
Name: PerformanceRating, dtype: float64
hr
0.0 2.764706
1.0 2.596154
Name: EnvironmentSatisfaction, dtype: float64
hr
0.0 2.813725
1.0 2.961538
Name: RelationshipSatisfaction, dtype: float64
我的问题是如何将每个评级['JobSatisfaction', 'PerformanceRating', 'EnvironmentSatisfaction', 'RelationshipSatisfaction']的这些组均值(研究、销售、manu、hr)绘制到 4 个不同的条形图上,以便可视化和比较每个组之间的差异?
我的数据来自 IBM HR 数据集:https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset
【问题讨论】:
标签: python pandas matplotlib pandas-groupby seaborn