【发布时间】:2020-07-01 19:57:17
【问题描述】:
我有一个 pandas 数据框,其中包含 100 多个分类列和两个数字列。例如,在下面的数据中,为简单起见,我只包含了四个分类列:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({
'Gender': ['M','M','F','M','F','M','F','M','F','F'],
'Class' : ['A','B','B','C','A','C','B','A','A','C'],
'Class_2': ['A1','B2','B3','C5','B1','C2','B1','B1','C3','D1'],
'District' : ['N','N','E','S','S','N','N','E','S','S']
})
df['X1'] = np.random.normal(1000, 55, 10)
df['X2'] = np.random.normal(100, 10, 10)
对于每个分类列(即Gender、Class、Class_2 和District)我需要做以下总结:
#Show the distribution of the column, both count and percent
print((df["Gender"].value_counts(sort=False, normalize=False)))
print((df["Gender"].value_counts(sort=False, normalize=True))*100)
#Plot the histogram
plt.figure(figsize=(9, 8))
plt.hist(df['Gender'], color = 'blue', edgecolor = 'black',
bins = 30)
plt.xlabel("Gender")
plt.ylabel("Count")
plt.title("Gender distribution")
#Aggregate sum of X1 and X2 by Gender, and find the ratio
#ratio by Gender
var1 = pd.DataFrame(df.groupby('Gender')['X2', 'X1'].agg(['sum']).reset_index())
var1['ratio'] = var1['X2']/var1['X1']
print(var1)
var1.plot('Gender', 'ratio', kind='bar',
colormap='Paired',
title=' Ratio by Gender')
【问题讨论】:
-
好的,你的问题是什么?
-
对于每个分类列(即 Gender、Class、Class_2 和 District)我需要做我在示例中提出的摘要。