在python中根据计数和类别绘制数据库答案

【问题标题】：Plotting data bases on count and categories in python在python中根据计数和类别绘制数据库
【发布时间】：2019-02-07 14:04:50
【问题描述】：

我在数据框中有以下数据：

 Customer_ID| Customer_status| store_ID| date_of_transaction

  12352423| active | 65|2018/10/1
  12352425| inactive | 70|2018/10/1
  12352425| inactive | 65|2018/10/1
  12352426| active | 75|2018/10/1

目标：查看每家商店的非活跃客户与活跃客户的分布（或平均值）。这是为了确定是否有一些商店有更多的非活跃客户。

我使用以下代码创建了一个额外的列，其中包含每个商店的计数：

df_new['Counts'] =df_customer.store_id.groupby(df_customer.store_id).transform('count')

所以现在我有一个额外的列，其中包含每个唯一商店 ID 的计数。 EX :) 每个 store id = 65 的条目，counts 列都会显示 32，因为 store id 65 在整个数据集中出现了 32 次。

我对如何绘制此图感到困惑，因此我可以可视化每个唯一商店的不活动情况和客户状态。

谢谢！

【问题讨论】：

标签： python matplotlib seaborn

【解决方案1】：

要获得每个 store_id 的非活动平均值，您可以使用：

(df['Customer_status'] == 'inactive').groupby(df['store_ID']).mean()

输出：

store_ID
65    0.5
70    1.0
75    0.0
Name: Customer_status, dtype: float64

首先创建一个 customer_status 等于“inactive”的布尔系列，然后按 store_ID 对该系列进行分组，取平均值以获得平均值。

绘图：

(df['Customer_status'] == 'inactive').groupby(df['store_ID']).mean().plot.bar(title='Average Inactive Customers Status by Store ID')

输出：

更新评论，是的，稍微重塑你的数据框并绘制：

df_out = df.groupby(['store_ID','Customer_status'])['Customer_ID'].count().unstack() 
df_out.div(df_out.sum(1), axis=0).plot.bar(title='Average Custome Status by Store ID')

输出：

【讨论】：

非常感谢！有没有办法让我在同一个情节中同时查看非活动和活动？

【解决方案2】：

为什么不：

df.groupby(df['store_ID','Customer_status']).mean()

然后对您想要的任何其他统计数据重复此操作并合并数据帧。

【讨论】：