【问题标题】:How to put multiple median values in the boxplot?如何在箱线图中放置多个中值?
【发布时间】:2019-11-29 00:32:51
【问题描述】:

我只发现代码可以将中值放入箱线图中,我试过了。但由于我的箱线图是多重的,所以它无法获取 x-tick 获取定位器。如何找到箱线图的次要刻度定位器,我已经尝试过但仍然无法获得多个箱线图位置的位置。有什么建议可以改善这个情节吗?

df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
              ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
              ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
              ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
              ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
              ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])

df.columns = ['item', 'score', 'grade']


fig = plt.figure(figsize=(6, 3), dpi=150)

ax = sns.boxplot(x='item', y='score', data=df, hue='grade', palette=sns.color_palette('husl'))
ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')

medians = df.groupby(['item','grade'])['score'].median().values
median_labels = [str(np.round(s, 2)) for s in medians]

pos = range(len(medians))
for tick,label in zip(pos, ax.get_xticklabels()):
    ax.text(pos[tick], medians[tick], median_labels[tick], 
            horizontalalignment='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))

【问题讨论】:

    标签: python label seaborn boxplot


    【解决方案1】:

    众所周知,Seaborn 很难使用。下面的代码有效,但如果其中一个类别为空且未绘制箱线图,则可能会中断,例如,使用风险自负:

    df = pd.DataFrame([['Apple', 10, 'A'],['Apple', 8, 'B'],['Apple', 10, 'C'],
                  ['Apple', 5, 'A'],['Apple', 7, 'B'],['Apple', 9, 'C'],
                  ['Apple', 3, 'A'],['Apple', 5, 'B'],['Apple', 4, 'C'],
                  ['Orange', 3, 'A'],['Orange', 4, 'B'],['Orange', 6, 'C'],
                  ['Orange', 2, 'A'],['Orange', 8, 'B'],['Orange', 4, 'C'],
                  ['Orange', 8, 'A'],['Orange', 10, 'B'],['Orange', 1, 'C']])
    
    df.columns = ['item', 'score', 'grade']
    
    
    width = 0.8
    hue_col = 'grade'
    
    fig, plt.figure(figsize=(6, 3), dpi=150)
    ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width)
    ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')
    
    # get the offsets used by boxplot when hue-nesting is used
    # https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
    n_levels = len(df[hue_col].unique())
    each_width = width / n_levels
    offsets = np.linspace(0, width - each_width, n_levels)
    offsets -= offsets.mean()
    
    medians = df.groupby(['item','grade'])['score'].median()
    
    for x0,(_,med0) in enumerate(medians.groupby(level=0)):
        for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
            ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
                horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))
    

    一般来说,为避免任何意外,如果您想修改 seaborn 绘图,我建议您指定 orderhue_order,以便按照预先确定的顺序绘制绘图。这是另一个能够处理缺失类别的版本:

    df = pd.DataFrame([['Apple', 8, 'B'],['Apple', 10, 'C'],
                  ['Apple', 7, 'B'],['Apple', 9, 'C'],
                  ['Apple', 5, 'B'],['Apple', 4, 'C'],
                  ['Orange', 3, 'A'],['Orange', 6, 'C'],
                  ['Orange', 2, 'A'],['Orange', 4, 'C'],
                  ['Orange', 8, 'A'],['Orange', 1, 'C']])
    
    df.columns = ['item', 'score', 'grade']
    
    
    order = ['Apple', 'Orange']
    hue_col = 'grade'
    hue_order = ['A','B','C']
    width = 0.8
    
    fig, plt.figure(figsize=(6, 3), dpi=150)
    ax = sns.boxplot(x='item', y='score', data=df, hue=hue_col, palette=sns.color_palette('husl'), width=width,
                    order=order, hue_order=hue_order)
    ax.legend(loc='lower right', bbox_to_anchor=(1.11, 0), ncol=1, fontsize = 'x-small').set_title('')
    
    # get the offsets used by boxplot when hue-nesting is used
    # https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
    n_levels = len(df[hue_col].unique())
    each_width = width / n_levels
    offsets = np.linspace(0, width - each_width, n_levels)
    offsets -= offsets.mean()
    
    medians = df.groupby(['item','grade'])['score'].median()
    medians = medians.reindex(pd.MultiIndex.from_product([order,hue_order]))
    
    for x0,(_,med0) in enumerate(medians.groupby(level=0)):
        for off,(_,med1) in zip(offsets,med0.groupby(level=1)):
            if not np.isnan(med1.item()):
                ax.text(x0+off, med1.item(), '{:.0f}'.format(med1.item()), 
                    horizontalalignment='center', va='center', size='xx-small', color='w', weight='semibold', bbox=dict(facecolor='#445A64'))
    

    【讨论】:

    • 注意,可以使用plt.subplots_adjust(bottom=0.20)为x轴的标签留出更多空间。
    猜你喜欢
    • 2015-08-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-07
    • 2017-11-17
    相关资源
    最近更新 更多