【问题标题】:How to generate a bar chart with data from a csv?如何使用来自 csv 的数据生成条形图?
【发布时间】:2019-10-09 20:17:49
【问题描述】:

我有一个包含几列的 csv,其中之一是城市列。有几个城市,也有同一个城市,重复了好几次。 我想设置一个条形图,其中包含 CSV 中出现的城市数量。 示例:

Y   X
5   Belo Horizonte
1   Vespasiano
4   São Paulo

我做了下面的代码,但是我得到了错误,就在代码后面。

代码:

import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#lendo o arquivo
tb_usuarios = 'tb_usuarios.csv'
usuarios = pd.read_csv(tb_usuarios,
header=0,
index_col=False
)
print(usuarios.head())
usuarios["vc_municipio"] = usuarios["vc_municipio"].dropna()
usuarios["vc_municipio"] = usuarios["vc_municipio"].str.upper()
municipio = usuarios.groupby(['vc_municipio'])
print(municipio)
y_pos = usuarios.groupby(['vc_municipio'])['vc_municipio'].count()
print(y_pos)

plt.bar(y_pos, municipio, align='center', alpha=0.5)
plt.xticks(y_pos, municipio)
plt.ylabel('Qtd')
plt.title('Municipio')

plt.show()

错误:

Traceback (most recent call last):
  File "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py", line 20, in <module>
    plt.bar(y_pos, municipio, align='center', alpha=0.5)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\pyplot.py", line 2440, in bar
    **({"data": data} if data is not None else {}), **kwargs)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\__init__.py", line 1601, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_axes.py", line 2348, in bar
    self._process_unit_info(xdata=x, ydata=height, kwargs=kwargs)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2126, in _process_unit_info
    kwargs = _process_single_axis(ydata, self.yaxis, 'yunits', kwargs)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2108, in _process_single_axis
    axis.update_units(data)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axis.py", line 1493, in update_units
    default = self.converter.default_units(data, self)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 115, in default_units
    axis.set_units(UnitData(data))
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 181, in __init__
    self.update(data)
  File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 215, in update
    for val in OrderedDict.fromkeys(data):
TypeError: unhashable type: 'numpy.ndarray'

我的输出:

"C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\Scripts\python.exe" "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py"
   pr_usuario  bl_administrador dt_nascimento  ... dt_cheque es_anexo dt_anexo
0           2                 0    24/02/1980  ...       NaN      NaN      NaN
1           3                 0    05/09/1985  ...       NaN      NaN      NaN
2           4                 1    20/03/1984  ...       NaN      NaN      NaN
3           5                 1    20/01/1982  ...       NaN      NaN      NaN
4           6                 0    25/05/1985  ...       NaN      NaN      NaN

[5 rows x 30 columns]
{'BELO HORIZONTE': Int64Index([0, 1, 2, 3, 6, 9, 10, 14, 17, 20, 22, 25], dtype='int64'), 'BRASILIA': Int64Index([4], dtype='int64'), 'CONTAGEM': Int64Index([23], dtype='int64'), 'CURITIBA': Int64Index([5, 7, 15, 18, 19], dtype='int64'), 'SANTA LUZIA': Int64Index([21], dtype='int64'), 'VESPASIANO': Int64Index([24], dtype='int64')}
vc_municipio
BELO HORIZONTE    12
BRASILIA           1
CONTAGEM           1
CURITIBA           5
SANTA LUZIA        1
VESPASIANO         1
Name: vc_municipio, dtype: int64

这个图表怎么做?

【问题讨论】:

    标签: python python-3.x pandas matplotlib seaborn


    【解决方案1】:

    使用pandas:

    您的数据:

    • 假设您的数据位于.csv 中,格式如下
    0.0,BELO HORIZONTE
    1.0,BELO HORIZONTE
    2.0,BELO HORIZONTE
    3.0,BELO HORIZONTE
    6.0,BELO HORIZONTE
    9.0,BELO HORIZONTE
    10.0,BELO HORIZONTE
    14.0,BELO HORIZONTE
    17.0,BELO HORIZONTE
    20.0,BELO HORIZONTE
    22.0,BELO HORIZONTE
    25.0,BELO HORIZONTE
    4.0,BRASILIA
    23.0,CONTAGEM
    5.0,CURITIBA
    7.0,CURITIBA
    15.0,CURITIBA
    18.0,CURITIBA
    19.0,CURITIBA
    21.0,SANTA LUZIA
    24.0,VESPASIANO
    

    创建数据框:

    import pandas as pd
    import matplotlib.pyplot as plt
    
    
    df = pd.read_csv('test.csv', header=None)
    df.columns = ['value', 'city']
    
        value            city
    0     0.0  BELO HORIZONTE
    1     1.0  BELO HORIZONTE
    2     2.0  BELO HORIZONTE
    3     3.0  BELO HORIZONTE
    4     6.0  BELO HORIZONTE
    5     9.0  BELO HORIZONTE
    6    10.0  BELO HORIZONTE
    7    14.0  BELO HORIZONTE
    8    17.0  BELO HORIZONTE
    9    20.0  BELO HORIZONTE
    10   22.0  BELO HORIZONTE
    11   25.0  BELO HORIZONTE
    12    4.0        BRASILIA
    13   23.0        CONTAGEM
    14    5.0        CURITIBA
    15    7.0        CURITIBA
    16   15.0        CURITIBA
    17   18.0        CURITIBA
    18   19.0        CURITIBA
    19   21.0     SANTA LUZIA
    20   24.0      VESPASIANO
    

    分组并绘制数据:

    # groupby & count
    city_count = df.groupby('city').count()
    
                    value
    city                 
    BELO HORIZONTE     12
    BRASILIA            1
    CONTAGEM            1
    CURITIBA            5
    SANTA LUZIA         1
    VESPASIANO          1
    
    # plot
    city_count.plot.bar()
    plt.ylabel('Qtd')
    plt.title('Municipio')
    plt.show()
    

    seaborn绘图:

    import seaborn as sns
    
    sns.barplot(x=city_count.index, y='value', data=city_count)
    plt.xticks(rotation=45)
    plt.show()
    

    【讨论】:

      【解决方案2】:

      municipio = usuarios.groupby(['vc_municipio']) 在 pandas 中返回一个 groupby 对象,这会导致您的错误,因为 matplotlib 无法处理该问题。

      plt.bar 采用 x 值后跟 y 值(参见 docs)。

      matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)

      幸运的是,当您在 pandas 中执行 groupby 时,它会自动将 x 值(或类别)合并为您的索引。

      假设municipio 是一个类别列表(您想要按城市计数?)那么以下应该可以工作。

      替换你的代码

      plt.bar(y_pos, municipio, align='center', alpha=0.5)
      

      plt.bar(y_pos.index, y_pos, align='center', alpha=0.5)
      

      或者,您可以使用plt.barpandas version(它扩展了matplot lib)来本地处理一些数据框怪癖。

      【讨论】:

        猜你喜欢
        • 2017-08-05
        • 2017-01-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-03-11
        • 1970-01-01
        • 2013-05-22
        相关资源
        最近更新 更多