【问题标题】:How to plot a histogram for all unique combinations of data?如何为所有独特的数据组合绘制直方图?
【发布时间】:2020-10-18 06:35:17
【问题描述】:
  • 有没有一种方法可以在 python 中的特定日期的不同场景下获得人口的大小频率直方图
    • 表示带有误差线
  • 我的数据格式类似于此表:
SCENARIO     RUN     MEAN     DAY
A             1       25       10
A             1       15       30
A             2       20       10
A             2       27       30
B             1       45       10
B             1       50       30
B             2       43       10
B             2       35       30
  • results_data.groupby(['Scenario', 'Run']).mean() 没有给我想要可视化数据的日子
    • 它返回每次运行天数的平均值。

【问题讨论】:

    标签: python pandas matplotlib pandas-groupby seaborn


    【解决方案1】:

    使用seaborn.FacetGrid

    • FactGrid 是用于绘制条件关系的多图网格
    • seaborn.distplot 映射到FacetGrid 并使用hue=DAY

    设置数据和数据帧

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    import random  # just for test data
    import numpy as np  # just for test data
    
    
    # data
    random.seed(365)
    np.random.seed(365)
    data = {'MEAN': [np.random.randint(20, 51) for _ in range(500)],
            'SCENARIO': [random.choice(['A', 'B']) for _ in range(500)],
            'DAY': [random.choice([10, 30]) for _ in range(500)],
            'RUN': [random.choice([1, 2]) for _ in range(500)]}
    
    # create dataframe
    df = pd.DataFrame(data)
    

    使用kde=False 绘图

    g = sns.FacetGrid(df, col='RUN', row='SCENARIO', hue='DAY', height=5)
    g = g.map(sns.distplot, 'MEAN', bins=range(20, 51, 5), kde=False, hist_kws=dict(edgecolor="k", linewidth=1)).add_legend()
    plt.show()
    

    使用kde=True 绘图

    g = sns.FacetGrid(df, col='RUN', row='SCENARIO', hue='DAY', height=5, palette='GnBu')
    g = g.map(sns.distplot, 'MEAN', bins=range(20, 51, 5), kde=True, hist_kws=dict(edgecolor="k", linewidth=1)).add_legend()
    plt.show()
    

    带有误差线的绘图

    from itertools import product
    
    # create unique combinations for filtering df
    scenarios = df.SCENARIO.unique()
    runs = df.RUN.unique()
    days = df.DAY.unique()
    combo_list = [scenarios, runs, days]
    results = list(product(*combo_list))  
    
    # plot
    for i, result in enumerate(results, 1):  # iterate through each set of combinations
        s, r, d = result
        data = df[(df.SCENARIO == s) & (df.RUN == r) & (df.DAY == d)]  # filter dataframe
        
        # add subplot rows, columns; needs to equal the number of combinations in results
        plt.subplot(2, 4, i)
        
        # plot hist and unpack values
        n, bins, _ = plt.hist(x='MEAN', bins=range(20, 51, 5), data=data, color='g')
        
        # calculate bin centers
        bin_centers = 0.5 * (bins[:-1] + bins[1:])
        
        # draw errobars, use the sqrt error. You can use what you want there
        # poissonian 1 sigma intervals would make more sense
        plt.errorbar(bin_centers, n, yerr=np.sqrt(n), fmt='k.')
    
    
        plt.title(f'Scenario: {s} | Run: {r} | Day: {d}')
    plt.tight_layout()
    plt.show()
    

    【讨论】:

    • 很棒的课程!:) @TrentonMcKinney
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-02-15
    • 1970-01-01
    • 2020-10-24
    • 2020-10-22
    • 2021-12-20
    • 1970-01-01
    相关资源
    最近更新 更多