使用 CSV 数据 python 创建条形图答案

【问题标题】：Creating bar chart with CSV data python使用 CSV 数据 python 创建条形图
【发布时间】：2016-07-06 03:38:10
【问题描述】：

我有一个包含类似数据的 CSV

4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 18:14:58,57,4
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 20:11:15,1884,90
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-10-04 09:44:21,1146,6
4be390eefaf9a64e7cb52937c4a5c77a,"avito.ru",2014-09-29 21:01:29,48,3

我这样排序

print(infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum())

我得到了数据：

address            used_at
am.ru              2014         413071
                   2015         183402
auto.ru            2014        9122342
                   2015        6923367
avito.ru           2014       84503151
                   2015       87688571
avtomarket.ru      2014         106849
                   2015          95927
cars.mail.ru/sale  2014         211456
                   2015         167278
drom.ru            2014       11014955
                   2015        9704124
e1.ru              2014       28678357
                   2015       27961857
irr.ru/cars        2014         222193
                   2015         133678

我需要创建这样的条形图example

但是我需要在 2014 年和 2015 年对每个网站（在 x 轴）和 active_seconds（在 y 轴）的总和进行设置。例如，他们使用 np.array，但我有对象类型系列。

我尝试这样做：

width = 0.35
plt.figure()
ax = graph_by_duration['address'].plot(kind='bar', secondary_y=['active_seconds'])
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

我应该将其转换为 np.array 还是处理来执行此操作？

【问题讨论】：

标签： python csv pandas matplotlib bar-chart

【解决方案1】：

我认为您可以先添加reset_index，然后添加pivot DataFrame 以创建列2014 和2015。最后使用plot.bar：

df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum()
                                                                          .reset_index()
print df
              address  used_at  active_seconds
0               am.ru     2014          413071
1               am.ru     2015          183402
2             auto.ru     2014         9122342
3             auto.ru     2015         6923367
4            avito.ru     2014        84503151
5            avito.ru     2015        87688571
6       avtomarket.ru     2014          106849
7       avtomarket.ru     2015           95927
8   cars.mail.ru/sale     2014          211456
9   cars.mail.ru/sale     2015          167278
10            drom.ru     2014        11014955
11            drom.ru     2015         9704124
12              e1.ru     2014        28678357
13              e1.ru     2015        27961857
14        irr.ru/cars     2014          222193
15        irr.ru/cars     2015          133678

graph_by_duration = df.pivot(index='address', columns='used_at', values='active_seconds')
print graph_by_duration
used_at                2014      2015
address                              
am.ru                413071    183402
auto.ru             9122342   6923367
avito.ru           84503151  87688571
avtomarket.ru        106849     95927
cars.mail.ru/sale    211456    167278
drom.ru            11014955   9704124
e1.ru              28678357  27961857
irr.ru/cars          222193    133678

ax = graph_by_duration.plot.bar(figsize=(10,8))
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

【讨论】：

在每个条形上方添加均值是真的吗？因为有些意味着太少，而且还不清楚发生了什么变化。也许是争论yerr？
我尝试使用yerr，但它不起作用 - 只能比较 mean 和 std: df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.mean() df1 = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.std() fig, ax = plt.subplots() df.plot.bar(yerr=df1, ax=ax) ax.set_ylabel('Time online') ax.set_title('Time spent online per web site, per year') plt.show()
错误SyntaxError: Non-ASCII character '\xd0' in file C:/Users/user/Desktop/project/main.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
有时会发生，如果某些字符被错误复制。也许尝试重写行中的代码8。
你能说，我如何将active_seconds 转换为小时？我有很大的手段，我想把它分成/ 3600。我写了for string in time['time online']: hour = string / 3600. round_h = '%.1f' % round(hour, 1) graph_by_duration = time.pivot(index='address', columns='used_at', values='round_h')，但我有一个错误