【发布时间】:2016-07-06 20:42:55
【问题描述】:
我有数据框,但所有字符串都是重复的,当我尝试打印图表时,它包含重复的列。我尝试删除它,但后来我的图表打印不正确。我的 csv 是 here。
数据帧common_users:
used_at common users pair of websites
0 2014 1364 avito.ru and e1.ru
1 2014 1364 e1.ru and avito.ru
2 2014 1716 avito.ru and drom.ru
3 2014 1716 drom.ru and avito.ru
4 2014 1602 avito.ru and auto.ru
5 2014 1602 auto.ru and avito.ru
6 2014 299 avito.ru and avtomarket.ru
7 2014 299 avtomarket.ru and avito.ru
8 2014 579 avito.ru and am.ru
9 2014 579 am.ru and avito.ru
10 2014 602 avito.ru and irr.ru/cars
11 2014 602 irr.ru/cars and avito.ru
12 2014 424 avito.ru and cars.mail.ru/sale
13 2014 424 cars.mail.ru/sale and avito.ru
14 2014 634 e1.ru and drom.ru
15 2014 634 drom.ru and e1.ru
16 2014 475 e1.ru and auto.ru
17 2014 475 auto.ru and e1.ru
.....
您可以看到网站名称颠倒了。我尝试按pair of websites 对它进行排序,因为我有KeyError。我用代码
df = pd.read_csv("avito_trend.csv", parse_dates=[2])
def f(df):
dfs = []
for x in [list(x) for x in itertools.combinations(df['address'].unique(), 2)]:
c1 = df.loc[df['address'].isin([x[0]]), 'ID']
c2 = df.loc[df['address'].isin([x[1]]), 'ID']
c = pd.Series(list(set(c1).intersection(set(c2))))
#add inverted intersection c2 vs c1
c_invert = pd.Series(list(set(c2).intersection(set(c1))))
dfs.append(pd.DataFrame({'common users':len(c), 'pair of websites':' and '.join(x)}, index=[0]))
#swap values in x
x[1],x[0] = x[0],x[1]
dfs.append(pd.DataFrame({'common users':len(c_invert), 'pair of websites':' and '.join(x)}, index=[0]))
return pd.concat(dfs)
common_users = df.groupby([df['used_at'].dt.year]).apply(f).reset_index(drop=True, level=1).reset_index()
graph_by_common_users = common_users.pivot(index='pair of websites', columns='used_at', values='common users')
#sort by column 2014
graph_by_common_users = graph_by_common_users.sort_values(2014, ascending=False)
ax = graph_by_common_users.plot(kind='barh', width=0.5, figsize=(10,20))
[label.set_rotation(25) for label in ax.get_xticklabels()]
rects = ax.patches
labels = [int(round(graph_by_common_users.loc[i, y])) for y in graph_by_common_users.columns.tolist() for i in graph_by_common_users.index]
for rect, label in zip(rects, labels):
height = rect.get_height()
ax.text(rect.get_width() + 3, rect.get_y() + rect.get_height(), label, fontsize=8)
plt.show()
我的图表如下所示:
【问题讨论】:
-
您能否提供一份预期标签列表,因为不清楚您想要实现什么目标?
-
现在我还有其他问题。我传递数组并得到
rects = ax1.patches labels = ["%d" % i for i in time['time online'].round()] for rect, label in zip(rects, labels): print rect, label height = rect.get_height() ax1.text(rect.get_x() + rect.get_width()/2, height + 5, label, ha='center', va='bottom')我在question 中描述了我的问题
标签: python pandas matplotlib