Matplotlib 散点图，每个点都有不同的文本答案

【问题标题】：Matplotlib scatter plot with different text at each pointMatplotlib 散点图，每个点都有不同的文本
【发布时间】：2020-12-19 18:18:04
【问题描述】：

假设我有 3 个系列

>>> df[df['Type']=="Machine Learning"]['Cost']
0     2300.00
1     3200.00
4     1350.00
7     1352.00
8     4056.00
9       79.00
10    1595.00
Name: Cost, dtype: float64
>>>df[df['Type']=="Machine Learning"]['Rank']
0      1
1      1
4      1
7      2
8      2
9      2
10     2
Name: Rank, dtype: int64
>>>df[df['Type']=="Machine Learning"]['Univ/Org']
0     Massachusetts Institute of Technology 
1     Massachusetts Institute of Technology 
4                                    EDX/MIT
7                        Stanford University
8                        Stanford University
9               Coursera/Stanford University
10                       Stanford University
Name: Univ/Org, dtype: object

现在我想绘制散点图，其中 y 轴为 Cost，X 轴为Rank，每个数据点的 Univ/Org 名称。

现在我在参考this问题后还能做的是

plt.scatter(df[df['Type']=="Machine Learning"]['Rank'], df[df['Type']=="Machine Learning"]['Cost'],marker='2', edgecolors='black')
for i, txt in enumerate(df[df['Type']=="Machine Learning"]['Univ/Org']):
    plt.annotate(txt, (df[df['Type']=="Machine Learning"]['Rank'][i], df[df['Type']=="Machine Learning"]['Cost'][i]))

它正在命名 2 个数据点，然后给出错误。

情节是：

错误是：

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-111-0d31107a166a> in <module>
      1 plt.scatter(df[df['Type']=="Machine Learning"]['Rank'], df[df['Type']=="Machine Learning"]['Cost'],marker='2', edgecolors='black')
      2 for i, txt in enumerate(df[df['Type']=="Machine Learning"]['Univ/Org']):
----> 3     plt.annotate(txt, (df[df['Type']=="Machine Learning"]['Rank'][i], df[df['Type']=="Machine Learning"]['Cost'][i]))

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4403         k = self._convert_scalar_indexer(k, kind="getitem")
   4404         try:
-> 4405             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4406         except KeyError as e1:
   4407             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 2

【问题讨论】：

您没有正确访问数据框的元素。
换句话说，2 不在您的索引中。使用iloc 进行基于位置的索引

标签： python pandas matplotlib scatter-plot

【解决方案1】：

几件事。

首先，我建议您将 ML 数据选择到一个新的数据框中。您还应该更精确地使用.loc 和.at 访问器。像这样：

mldf = df.loc[df['Type'] == "Machine Learning", :]

fig, ax = plt.sunplots()
ax.scatter('Rank', 'Cost', data=mldf, marker='2', edgecolors='black')
for i in mldf.index:
    ax.annotate(mldf.at[i, 'Univ/Org'], (mldf.at[i, 'Rank'], mldf.at[i, 'Cost'])

【讨论】：

谢谢，我使用的是一个有点大的数据集，所以文本相互重叠，你能建议如何克服这个问题。
@AhmadAnis 我会更改文字
你的意思是ax.annotate中的第一个参数？
@AhmadAnis 是的