如何在matplotlib（python）中组合组图答案

【问题标题】：How to combine group plots in matplotlib (python)如何在matplotlib（python）中组合组图
【发布时间】：2020-07-15 08:55:46
【问题描述】：

我有一个大型数据集，其中包含按国家和年份分列的数千行（纵向文本数据）。如以下数据框所示，wordcount 列表示“世俗”一词的出现次数。

df3
index    country      text                          wordcount  year
0        Bolivia      This is an example text..      1         2010
1        Bolivia      This is an example text2..     5         2015
2        Bolivia      This is an example text3 ..    7         2017

现在我想分别为所有国家/地区创建子图（散点图），其中year 将在x-axis 和wordcount 上为每个国家/地区提供y-axis。以下代码分别为我提供了每个国家/地区所需的图，但我需要将它们组合起来。例如，每行将包含 10 个国家/地区。有没有简单的方法来做到这一点？任何帮助，将不胜感激。谢谢你。如果需要任何澄清，请告诉我。

import matplotlib.pyplot as plt

for title, group in df3.groupby('country'):
    group.plot(x='year', y='wordcount', title=title)

更新： 我也尝试过使用以下代码，但我想它不会在一年内多次总结 wordcount 的相同值。换句话说，与之前的代码相比，我得到的单词出现次数更少（单独的国家图）。

fig, axes = plt.subplots(nrows=11, ncols=8, sharex=True, sharey=True, figsize=(18,10))
axes_list = [item for sublist in axes for item in sublist] 
for countryname, selection in df3.head(1200).groupby("country"):
    
    ax = axes_list.pop(0)
    selection.plot(x='year', y='wordcount', label=countryname, ax=ax, legend=False)
    ax.set_title(countryname)
    ax.tick_params(
        which='both',
        bottom='off',
        left='off',
        right='off',
        top='off'
    )
    ax.grid(linewidth=0.5)
    ax.set_xlim((1980, 2020))
    ax.set_xlabel("")
    ax.set_xticks(range(1980, 2020, 10))
    ax.spines['left'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.set_ylim((0, 10))

for ax in axes_list:
    ax.remove()

plt.subplots_adjust(hspace=1)

plt.tight_layout()

【问题讨论】：

尝试使用matplotlib.pyplot.subplots构造一个循环
我试过这个。我刚刚更新了我使用的代码以及答案中的问题。谢谢。

标签： python python-3.x pandas matplotlib

【解决方案1】：

您需要计算每个国家/地区每年的wordcount 值总数：

sum_df=pd.DataFrame(df3.groupby(['Country','year']).wordcount.sum()).reset_index()

然后：

df_pivot = sum_df.pivot(index='year', columns='Country', values='wordcount')
df_pivot.plot()

【讨论】：

即使在将年份重置为索引后，我也会收到 ValueError: Index contains duplicate entries, cannot reshape 错误。
也许在某些情况下，对于相同的 year 和 country 您有多个 wordcount 值？您需要先处理这些问题！
是的，我有多个 wordcount 值用于相同的 year，因为每年都有多个行/文本文件。
您希望如何处理这些多个值？您想取平均值、最大值、最小值还是其他值？
我想使用count 或它们的出现。