分组值计数条形图的子图答案

【问题标题】：Subplot for grouped value count bar plot分组值计数条形图的子图
【发布时间】：2021-09-07 09:38:22
【问题描述】：

我的表格如下所示

YEAR    RESPONSIBLE DISTRICT
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
... ... ...
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO

写完之后

g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()

我在下面

YEAR         RESPONSIBLE DISTRICT
2014         05 - LUBBOCK            12312
             15 - SAN ANTONIO        10457
             18 - DALLAS              9885
             04 - AMARILLO            9617
             08 - ABILENE             8730
                                     ...  
2020         21 - PHARR               5645
             25 - CHILDRESS           5625
             20 - BEAUMONT            5560
             22 - LAREDO              5034
             24 - EL PASO             4620

我总共有 25 个区。现在我想创建 25 个子图，所以每个子图代表一个地区。对于每个子图，我希望 2014-2020 年位于 x 轴上，值计数位于 y 轴上。我怎么能这样做？

【问题讨论】：

标签： python pandas matplotlib bar-chart subplot

【解决方案1】：

这是你所期望的吗？

import matplotlib.pyplot as plt

fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
    ax.set_title(district)
    ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()

plt.show()

【讨论】：

【解决方案2】：

这应该可行。

import matplotlib.pyplot as plt
import pandas as pd


g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()


fig, axs = plt.subplots(5, 5, constrained_layout=True)

for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
    x = dfi.index.get_level_values('YEAR').values
    y = dfi.values
    ax.bar(x, y)
    ax.set_title(district)

plt.show()

【讨论】：

【解决方案3】：

只有pandas的正确方法是用.pivot塑造dataframe，然后正确使用pandas.DataFrame.plot。

导入和数据

import pandas as pd
import numpy as np  # for test data
import seaborn as sns  # only for seaborn option

# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
        'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)

# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')

# display(dfp)
RESPONSIBLE DISTRICT  04 - AMARILLO  05 - LUBBOCK  08 - ABILENE  15 - SAN ANTONIO  18 - DALLAS  20 - BEAUMONT  21 - PHARR  22 - LAREDO  24 - EL PASO  25 - CHILDRESS
YEAR                                                                                                                                                                
2014                           1407          1406          1485              1456         1392           1456        1499         1458          1394            1452
2015                           1436          1423          1428              1441         1395           1400        1423         1442          1375            1399
2016                           1480          1381          1393              1415         1446           1442        1414         1435          1452            1454
2017                           1422          1388          1485              1447         1404           1401        1413         1470          1424            1426
2018                           1479          1424          1384              1450         1390           1384        1445         1435          1478            1386
2019                           1387          1317          1379              1457         1457           1476        1447         1459          1451            1406
2020                           1462          1452          1454              1448         1441           1428        1411         1407          1402            1445

`pandas.DataFrame.plot`

如果首选折线图，请使用 kind='line'。

# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)

`seaborn.catplot`

seaborn 是 matplotlib 的高级 API
这是最简单的方法，因为不需要重新调整数据框的形状。

p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}')  # shortens the titles

【讨论】：