【问题标题】:How to normalize a seaborn countplot with multiple categorical variables如何标准化具有多个分类变量的 seaborn 计数图
【发布时间】:2018-10-19 11:25:33
【问题描述】:

我为数据框的多个分类变量创建了一个 seaborn countplot,但我想要百分比而不是计数?

最好的选择是什么?条形图?我可以使用如下查询来一次获取条形图吗?

for i, col in enumerate(df_categorical.columns):
   plt.figure(i)
   sns.countplot(x=col,hue='Response',data=df_categorical) 

这个查询一次性给我所有变量的countplot

谢谢!

数据如下所示:

    State           Response     Coverage   Education   Effective To Date   EmploymentStatus       Gender   Location Code   Marital Status  Policy Type Policy    Renew Offer Type  Sales Channel   Vehicle Class   Vehicle Size    
0   Washington  No  Basic   Bachelor    2/24/11 Employed    F   Suburban    Married Corporate Auto  Corporate L3    Offer1  Agent   Two-Door Car    Medsize  
1   Arizona     No  Extended    Bachelor    1/31/11 Unemployed  F   Suburban    Single  Personal Auto   Personal L3 Offer3  Agent   Four-Door Car   Medsize
2   Nevada      No  Premium Bachelor    2/19/11 Employed    F   Suburban    Married Personal Auto   Personal L3 Offer1  Agent   Two-Door Car    Medsize
3   California  No  Basic   Bachelor    1/20/11 Unemployed  M   Suburban    Married Corporate Auto  Corporate L2    Offer1  Call Center SUV Medsize
4   Washington  No  Basic   Bachelor    2/3/11  Employed    M   Rural   Single  Personal Auto   Personal L1 Offer1  Agent   Four-Door Car   Medsize

【问题讨论】:

    标签: python pandas matplotlib seaborn


    【解决方案1】:

    考虑使用groupby.transform 来计算百分比列,然后运行barplotx 表示原始值列,y 表示百分比列。

    数据 (仅将两个 No 转换为 Yes 对原始发布数据的响应)

    from io import StringIO
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    txt = '''
        State           Response     Coverage   Education   "Effective To Date"   EmploymentStatus       Gender   "Location Code"   "Marital Status"  "Policy Type" Policy    "Renew Offer Type"  "Sales Channel"   "Vehicle Class"   "Vehicle Size" 
    0   Washington  No  Basic   Bachelor    "2/24/11" Employed    F   Suburban    Married "Corporate Auto"  "Corporate L3"    Offer1  Agent   "Two-Door Car"    Medsize  
    1   Arizona     No  Extended    Bachelor  "1/31/11"   Unemployed  F   Suburban    Single  "Personal Auto"   "Personal L3" Offer3  Agent   "Four-Door Car"   Medsize
    2   Nevada      Yes  Premium Bachelor    "2/19/11" Employed    F   Suburban    Married "Personal Auto"   "Personal L3" Offer1  Agent   "Two-Door Car"    Medsize
    3   California  No  Basic   Bachelor    "1/20/11" Unemployed  M   Suburban    Married "Corporate Auto"  "Corporate L2"    Offer1  "Call Center" SUV Medsize
    4   Washington  Yes  Basic   Bachelor    "2/3/11"  Employed    M   Rural   Single  "Personal Auto"   "Personal L1" Offer1  Agent   "Four-Door Car"   Medsize'''
    
    df_categorical = pd.read_table(StringIO(txt), sep="\s+")
    

    绘图 (跨两列的多个绘图的单个图)

    fig = plt.figure(figsize=(10,30))
    
    for i, col in enumerate(df_categorical.columns):   
       # PERCENT COLUMN CALCULATION
       df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
                                       .transform(lambda x: len(x)) / len(df_categorical)
    
       plt.subplot(8, 2, i+1)   
       sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
              .set(xlabel=col, ylabel='Percent')    
    
    plt.tight_layout()
    plt.show()
    plt.clf()
    
    plt.close('all')
    

    【讨论】:

    • 非常感谢!
    猜你喜欢
    • 1970-01-01
    • 2020-09-08
    • 2021-07-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-10-10
    • 2020-03-25
    相关资源
    最近更新 更多