如何标准化具有多个分类变量的 seaborn 计数图答案

【问题标题】：How to normalize a seaborn countplot with multiple categorical variables如何标准化具有多个分类变量的 seaborn 计数图
【发布时间】：2018-10-19 11:25:33
【问题描述】：

我为数据框的多个分类变量创建了一个 seaborn countplot，但我想要百分比而不是计数？

最好的选择是什么？条形图？我可以使用如下查询来一次获取条形图吗？

for i, col in enumerate(df_categorical.columns):
   plt.figure(i)
   sns.countplot(x=col,hue='Response',data=df_categorical)

这个查询一次性给我所有变量的countplot

谢谢！

数据如下所示：

    State           Response     Coverage   Education   Effective To Date   EmploymentStatus       Gender   Location Code   Marital Status  Policy Type Policy    Renew Offer Type  Sales Channel   Vehicle Class   Vehicle Size    
0   Washington  No  Basic   Bachelor    2/24/11 Employed    F   Suburban    Married Corporate Auto  Corporate L3    Offer1  Agent   Two-Door Car    Medsize  
1   Arizona     No  Extended    Bachelor    1/31/11 Unemployed  F   Suburban    Single  Personal Auto   Personal L3 Offer3  Agent   Four-Door Car   Medsize
2   Nevada      No  Premium Bachelor    2/19/11 Employed    F   Suburban    Married Personal Auto   Personal L3 Offer1  Agent   Two-Door Car    Medsize
3   California  No  Basic   Bachelor    1/20/11 Unemployed  M   Suburban    Married Corporate Auto  Corporate L2    Offer1  Call Center SUV Medsize
4   Washington  No  Basic   Bachelor    2/3/11  Employed    M   Rural   Single  Personal Auto   Personal L1 Offer1  Agent   Four-Door Car   Medsize

【问题讨论】：

标签： python pandas matplotlib seaborn

【解决方案1】：

考虑使用groupby.transform 来计算百分比列，然后运行barplot，x 表示原始值列，y 表示百分比列。

数据 （仅将两个 No 转换为 Yes 对原始发布数据的响应）

from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

txt = '''
    State           Response     Coverage   Education   "Effective To Date"   EmploymentStatus       Gender   "Location Code"   "Marital Status"  "Policy Type" Policy    "Renew Offer Type"  "Sales Channel"   "Vehicle Class"   "Vehicle Size" 
0   Washington  No  Basic   Bachelor    "2/24/11" Employed    F   Suburban    Married "Corporate Auto"  "Corporate L3"    Offer1  Agent   "Two-Door Car"    Medsize  
1   Arizona     No  Extended    Bachelor  "1/31/11"   Unemployed  F   Suburban    Single  "Personal Auto"   "Personal L3" Offer3  Agent   "Four-Door Car"   Medsize
2   Nevada      Yes  Premium Bachelor    "2/19/11" Employed    F   Suburban    Married "Personal Auto"   "Personal L3" Offer1  Agent   "Two-Door Car"    Medsize
3   California  No  Basic   Bachelor    "1/20/11" Unemployed  M   Suburban    Married "Corporate Auto"  "Corporate L2"    Offer1  "Call Center" SUV Medsize
4   Washington  Yes  Basic   Bachelor    "2/3/11"  Employed    M   Rural   Single  "Personal Auto"   "Personal L1" Offer1  Agent   "Four-Door Car"   Medsize'''

df_categorical = pd.read_table(StringIO(txt), sep="\s+")

绘图 （跨两列的多个绘图的单个图）

fig = plt.figure(figsize=(10,30))

for i, col in enumerate(df_categorical.columns):   
   # PERCENT COLUMN CALCULATION
   df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
                                   .transform(lambda x: len(x)) / len(df_categorical)

   plt.subplot(8, 2, i+1)   
   sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
          .set(xlabel=col, ylabel='Percent')    

plt.tight_layout()
plt.show()
plt.clf()

plt.close('all')

【讨论】：

非常感谢！