【问题标题】：Pythonic way to generate seaborn heatmap subplots生成海洋热图子图的 Pythonic 方法
【发布时间】：2021-09-24 14:18:21
【问题描述】：

我有一个包含 7 列的数据框。 Regressor 列有 3 个不同的回归量（DT、DT-2 和 DT-4）。

我想生成一个相关热图。

df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()

df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()

#  SUBPLOTS
fig = plt.figure(figsize=(12,6))

plt.subplot(221)  
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')

plt.subplot(222)  
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')

plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')

plt.show()

我也得到剧情

现在的问题是，如果我有 10 个回归器，那么我必须为每个回归器编写 10 次重复的代码。这不是一种 Python 方式或良好的编程习惯。

有没有办法以 Python 方式（即使用循环等）完成相同的工作？

请注意：在演示数据框中，我有 3 个回归器，但在我的主数据框中，我可以有更多的回归器。因此，我需要一种基于回归量生成图的动态方法。

演示数据：

{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

【问题讨论】：

所用颜色是否有规律，还是尽可能多地使用独特的颜色？
@sommervold 我需要每个型号都独一无二

标签： python plot seaborn data-visualization heatmap

【解决方案1】：

已经可用的答案是使用循环，但我环顾四周，看看是否可以使用多面网格来处理这个问题。这是一个很棒的answer。我已经修改它以适合您的代码。单个数据框被分解为具有类别变量的列以限制列数。 map 函数使用拆分数据绘制热图。但是，我们找不到设置颜色图的方法。我认为单色图的扩展非常适合分析。

import pandas as pd
import seaborn as sns

data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

df_dt_corr = pd.DataFrame(data)

g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))

【讨论】：

【解决方案2】：

这只是将所有内容放入循环中的情况。首先，程序通过获取df['Regressors'].values 中的所有唯一值来找到它应该使用的回归量。

axes 是根据有多少回归量自动决定的。它会尝试做一个正方形。

可能的颜色图定义为colors，如果您想要不同的颜色，请更改此列表。程序从第一种颜色开始，然后是第二种颜色，依此类推。如果颜色太少，会循环回到开头。

regressors = set(df['Regressor'].values)
fig = plt.figure(figsize=(12,6))

import math
axes = (math.ceil(math.sqrt(len(regressors))),) * 2

colors = [
            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']

for index, regressor in enumerate(regressors):
    df_dt = df[(df['Regressor']==regressor)]
    df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

    plt.subplot(*axes, index + 1)
    plt.title('Regressor: ' + regressor)
    sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
plt.show()

我改变了你使用plt.subplot的方式，因为你使用的方法最多只支持9个图，这样自动修改轴更容易。

【讨论】：

感谢您的回答。但是，代码正在生成 2 个数字。一个是空的，一个是相关性。
另外，如果我想添加标题，我在哪里添加plt.title?我在plt.show 之前使用过，但标题只出现在第三个相关图上，而不是在中心或整个图上。

【解决方案3】：

先选择唯一值

我将Regressor 列中的唯一值存储到vals 变量中。然后我用它来循环每个值。请参阅下面的解决方案：

# get the unique values in "Regressor" column
vals=df['Regressor'].unique()

plt.figure(figsize=[10,10],dpi=200)
plt.suptitle("Correlation Map") # Super Title
# start the loop for selecting data and plotting
for idx, value in enumerate(vals):
    #get the dataframe for the unique value and drop the unwanted column using the "iloc"
    data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
    # plot the correlation map
    plt.subplot(len(vals),2,idx+1)
    plt.title(f"Regressor={value}")
    sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here

您只需在此处选择子图和超标题中的列中的列数。

结果

【讨论】：