如何在 ECDF 图上使用标记答案

【问题标题】：How to use markers with ECDF plot如何在 ECDF 图上使用标记
【发布时间】：2021-11-16 21:07:51
【问题描述】：

要使用 seaborn 获得 ECDF 图，应执行以下操作：

sns.ecdfplot(data=myData, x='x', ax=axs, hue='mySeries')

这将为mySeries 中的myData 中的每个系列提供ECDF 图。

现在，我想为每个系列使用标记。我尝试使用与 sns.lineplot 相同的逻辑，如下所示：

sns.lineplot(data=myData,x='x',y='y',ax=axs,hue='mySeries',markers=True, style='mySeries',)

但不幸的是，markers 或 style 的关键字不适用于 sns.ecdf 图。我正在使用 seaborn 0.11.2。

对于一个可重现的例子，可以使用企鹅数据集：

import seaborn as sns

penguins = sns.load_dataset('penguins')
sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species")

【问题讨论】：

嗨@JohanC，所以我想做的是在我的ECDF图中使用标记而不是平线。但同样，即使曲线看起来不可读（这当然取决于曲线；在我的情况下是完全可能的），怎么能做到呢？我有色盲的同事，使用标记会有所帮助......
除了@JohanC 的回答，您可以考虑直接使用 matplotlib 创建自己的 ecdf 图，这也允许您使用 marker、linestyle 和其他绘图参数。 Plotting all of your data: Empirical cumulative distribution function

标签： python seaborn markers ecdf

【解决方案1】：

您可以遍历生成的行并应用标记。这是一个使用企鹅数据集的示例，一次使用默认值，然后使用标记，第三次使用不同的线型：

import matplotlib.pyplot as plt
import seaborn as sns

penguins = sns.load_dataset('penguins')

fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15, 4))

sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species", ax=ax1)
ax1.set_title('Default')

sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species", ax=ax2)
for lines, marker, legend_handle in zip(ax2.lines[::-1], ['*', 'o', '+'], ax2.legend_.legendHandles):
    lines.set_marker(marker)
    legend_handle.set_marker(marker)
ax2.set_title('Using markers')

sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species", ax=ax3)
for lines, linestyle, legend_handle in zip(ax3.lines[::-1], ['-', '--', ':'], ax3.legend_.legendHandles):
    lines.set_linestyle(linestyle)
    legend_handle.set_linestyle(linestyle)
ax3.set_title('Using linestyles')

plt.tight_layout()
plt.show()

【讨论】：

谢谢！只有一个细节，颜色和标记之间没有匹配。例如：Gentoo 是 + 还是 green？另一方面，我认为可能有一个更简单的解决方案，希望使用kwargs 来访问底层的matplotlib（就像使用sns.lineplot 一样）。我会继续寻找。不过，谢谢你！它给了我一个提示......我会看看它......
感谢您的反馈。与线条相比，图例的顺序似乎相反。我更新了代码。你能用你的数据进行测试吗？我真的认为目前没有更简单的方法来更改标记。

【解决方案2】：

如seaborn.ecdfplot 的文档中所述，其他关键字参数被传递给matplotlib.axes.Axes.plot()，它接受marker 和linestyle / ls
- marker 和 ls 接受单个字符串，该字符串适用于图中的所有 hue 组。

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('penguins', cache=True)

sns.ecdfplot(data=df, x="culmen_length_mm", hue="species", marker='^', ls='none', palette='colorblind')

直接计算ECDF

允许使用seaborn.lineplot 或matplotlib.pyplot.plot 的选项是直接计算ECDF 的x 和y。
Plotting all of your data: Empirical cumulative distribution functions

def ecdf(data, array: bool=True):
    """Compute ECDF for a one-dimensional array of measurements."""
    # Number of data points: n
    n = len(data)
    # x-data for the ECDF: x
    x = np.sort(data)
    # y-data for the ECDF: y
    y = np.arange(1, n+1) / n
    if not array:
        return pd.DataFrame({'x': x, 'y': y})
    else:
        return x, y

`matplotlib.pyplot.plot`

x, y = ecdf(df.culmen_length_mm)

plt.plot(x, y, marker='.', linestyle='none', color='tab:blue')
plt.title('All Species')
plt.xlabel('Culmen Length (mm)')
plt.ylabel('ECDF')
plt.margins(0.02)  # keep data off plot edges

对于多个组，如JohanC 所建议的那样

for species, marker in zip(df['species'].unique(), ['*', 'o', '+']):
    x, y = ecdf(df[df['species'] == species].culmen_length_mm)
    plt.plot(x, y, marker=marker, linestyle='none', label=species)
plt.legend(title='Species', bbox_to_anchor=(1, 1.02), loc='upper left')

`seaborn.lineplot`

# groupy to get the ecdf for each species
dfg = df.groupby('species')['culmen_length_mm'].apply(ecdf, False).reset_index(level=0).reset_index(drop=True)

# plot
p = sns.lineplot(data=dfg, x='x', y='y', hue='species', style='species', markers=True, palette='colorblind')
sns.move_legend(p, bbox_to_anchor=(1, 1.02), loc='upper left')

【讨论】：