注释一些散点图观察答案

【问题标题】：Annotate some scatter plot observations注释一些散点图观察
【发布时间】：2021-08-07 14:48:09
【问题描述】：

我使用下面的“示例数据框”(df) 和代码在 matplotlib 中制作了一个哑铃图。

结果看起来不错，但到目前为止我无法在 df["avg"] 列中用它们的平均值注释哑铃图。

有人可以指导我如何将每个观察值的平均值添加到各自的红点上方吗？非常感谢！

代码如下：

#example data
data = {'Brand': ['HC','TC','FF','AA'],
'2019Price': [22000,25000,27000,35000],
'2020Price':[25000, 30000, 29000, 39000]}
df = pd.DataFrame(data)
df["avg"] = (df['2019Price'] + df[ '2020Price'])/2
df = df.sort_values("2020Price", ascending = False)

#dumb bell plot
plt.hlines(y = df["Brand"], xmin = df["2019Price"], xmax = 
df["2020Price"], color = "grey", alpha = 0.4)
plt.scatter(y = df["Brand"], x = df["2019Price"], color = "blue", 
label = "2019")
plt.scatter(y = df["Brand"], x = df["2020Price"], color = "blue", 
label = "2020")
plt.scatter(y = df["Brand"], x = df["avg"], color = "red", label = 
"average")

plt.legend()

【问题讨论】：

标签： python pandas matplotlib seaborn annotate

【解决方案1】：

使用.iterrows 遍历'Brand' 和'avg' 的值，并使用.annotate 添加注释。
matplotlib Tutorials: Annotations
用pandas 1.3.1 和matplotlib 3.4.2 测试

import pandas as pd
import matplotlib.pyplot as plt

data = {'Brand': ['HC','TC','FF','AA'],
        '2019Price': [22000,25000,27000,35000],
        '2020Price':[25000, 30000, 29000, 39000]}

df = pd.DataFrame(data)

df["avg"] = df[['2019Price', '2020Price']].mean(axis=1)

df = df.sort_values("2020Price", ascending = False)

fig, ax = plt.subplots(figsize=(8, 6))

ax.hlines(y=df["Brand"], xmin=df["2019Price"], xmax=df["2020Price"], color="grey", alpha=0.4)

ax.scatter(y=df["Brand"], x=df["2019Price"], color="blue", label="2019")
ax.scatter(y=df["Brand"], x=df["2020Price"], color="blue", label="2020")
ax.scatter(y=df["Brand"], x=df["avg"], color="red", label="average")

_ = ax.legend()

# add annotations for average
for i, (j, k) in df[['Brand', 'avg']].iterrows():
    ax.annotate(f'{k:0.0f}', xy=(k, j), xytext=(-15, 5), textcoords='offset points')

使用pandas.DataFrame.plot 创建散点图。这使用matplotlib 作为后端，无需单独导入matplotlib。

import pandas as pd

data = {'Brand': ['HC','TC','FF','AA'],
        '2019Price': [22000,25000,27000,35000],
        '2020Price':[25000, 30000, 29000, 39000]}

df = pd.DataFrame(data)

df["avg"] = df[['2019Price', '2020Price']].mean(axis=1)

df = df.sort_values("2020Price", ascending = False)

ax = df.plot(kind='scatter', y='Brand', x='2019Price', c='b', label='2019', figsize=(8, 6))
df.plot(kind='scatter', y='Brand', x='2020Price', c='b', label='2020', ax=ax)
df.plot(kind='scatter', y='Brand', x='avg', c='r', label='average', ax=ax)

ax.hlines(y=df["Brand"], xmin=df["2019Price"], xmax=df["2020Price"], color="grey", alpha=0.4)

for i, (j, k) in df[['Brand', 'avg']].iterrows():
    ax.annotate(f'{k:0.0f}', xy=(k, j), xytext=(-15, 5), textcoords='offset points')

【讨论】：

我对 iterrows() 不熟悉，但您的解决方案很有意义并且效果很好。感谢您的代码（包括将图形转换为 ax 格式）和注释上的 matplotlib 链接！
使用 pandas 绘图也容易得多。你的代码就像诗歌，特伦顿。再次感谢你:)
@sb2020 不客气，很高兴这对你有用。