【问题标题】:Pythonic way to generate seaborn heatmap subplots生成海洋热图子图的 Pythonic 方法
【发布时间】:2021-09-24 14:18:21
【问题描述】:

我有一个包含 7 列的数据框。 Regressor 列有 3 个不同的回归量(DTDT-2DT-4)。

我想生成一个相关热图。

df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()

df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()

#  SUBPLOTS
fig = plt.figure(figsize=(12,6))

plt.subplot(221)  
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')

plt.subplot(222)  
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')

plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')

plt.show()

我也得到剧情

现在的问题是,如果我有 10 个回归器,那么我必须为每个回归器编写 10 次重复的代码。这不是一种 Python 方式或良好的编程习惯。

有没有办法以 Python 方式(即使用循环等)完成相同的工作?

请注意:在演示数据框中,我有 3 个回归器,但在我的主数据框中,我可以有更多的回归器。因此,我需要一种基于回归量生成图的动态方法。

演示数据:

{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

【问题讨论】:

  • 所用颜色是否有规律,还是尽可能多地使用独特的颜色?
  • @sommervold 我需要每个型号都独一无二

标签: python plot seaborn data-visualization heatmap


【解决方案1】:

已经可用的答案是使用循环,但我环顾四周,看看是否可以使用多面网格来处理这个问题。这是一个很棒的answer。我已经修改它以适合您的代码。单个数据框被分解为具有类别变量的列以限制列数。 map 函数使用拆分数据绘制热图。但是,我们找不到设置颜色图的方法。我认为单色图的扩展非常适合分析。

import pandas as pd
import seaborn as sns

data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

df_dt_corr = pd.DataFrame(data)

g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))

【讨论】:

    【解决方案2】:

    这只是将所有内容放入循环中的情况。首先,程序通过获取df['Regressors'].values 中的所有唯一值来找到它应该使用的回归量。

    axes 是根据有多少回归量自动决定的。它会尝试做一个正方形。

    可能的颜色图定义为colors,如果您想要不同的颜色,请更改此列表。程序从第一种颜色开始,然后是第二种颜色,依此类推。如果颜色太少,会循环回到开头。

    regressors = set(df['Regressor'].values)
    fig = plt.figure(figsize=(12,6))
    
    import math
    axes = (math.ceil(math.sqrt(len(regressors))),) * 2
    
    colors = [
                'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
                'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
                'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']
    
    for index, regressor in enumerate(regressors):
        df_dt = df[(df['Regressor']==regressor)]
        df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()
    
        plt.subplot(*axes, index + 1)
        plt.title('Regressor: ' + regressor)
        sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
    plt.show()
    
    

    我改变了你使用plt.subplot的方式,因为你使用的方法最多只支持9个图,这样自动修改轴更容易。

    【讨论】:

    • 感谢您的回答。但是,代码正在生成 2 个数字。一个是空的,一个是相关性。
    • 另外,如果我想添加标题,我在哪里添加plt.title?我在plt.show 之前使用过,但标题只出现在第三个相关图上,而不是在中心或整个图上。
    【解决方案3】:

    先选择唯一值

    我将Regressor 列中的唯一值存储到vals 变量中。然后我用它来循环每个值。请参阅下面的解决方案:

    # get the unique values in "Regressor" column
    vals=df['Regressor'].unique()
    
    plt.figure(figsize=[10,10],dpi=200)
    plt.suptitle("Correlation Map") # Super Title
    # start the loop for selecting data and plotting
    for idx, value in enumerate(vals):
        #get the dataframe for the unique value and drop the unwanted column using the "iloc"
        data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
        # plot the correlation map
        plt.subplot(len(vals),2,idx+1)
        plt.title(f"Regressor={value}")
        sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here
    

    您只需在此处选择子图和超标题中的列中的列数。

    结果

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-05-08
      • 1970-01-01
      • 2022-01-27
      • 1970-01-01
      • 2011-10-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多