在交叉表中使用 bin 特征

【问题标题】：Using bin feature in crosstabulation在交叉表中使用 bin 特征
【发布时间】：2020-07-04 15:06:27
【问题描述】：

labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)
pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
#cut and volume are columns/features of df Dataframe

上面是我尝试执行的sn-p。这是我得到的输出

cut     Fair    Good    Ideal   Premium     Very Good   All
row_0                       
539430  1.0     1.0     1.0     1.0     1.0     1.0

但我希望 ['small','medium','big','large'] 作为索引。
我怎样才能将它们作为索引？
我还尝试将 df.size 的类型从类别更改为字符串。没用

【问题讨论】：

标签： python pandas dataframe crosstab

【解决方案1】：

我认为您需要交换列，如果列名类似于熊猫中的方法，例如DataFrame.size，最好使用[] 而不是点表示法：

df = pd.DataFrame({'cut':['Fair', 'Good'] * 3, 'volume':[1, 5, 10, 29, 30, 2]})

labs = ['small','medium','big','large']
df['size'] = pd.qcut(df.volume,4,labels=labs)

#there is 18 values in df
print (df.size)
18    

df1 = pd.crosstab(df.size,df.cut,margins=True,normalize='columns')
print (df1)
cut    Fair  Good  All
row_0                 
18      1.0   1.0  1.0

df2 = pd.crosstab(df['cut'], df['size'],margins=True,normalize='columns')
print (df2)
size  small  medium  big  large  All
cut                                 
Fair    0.5     0.0  1.0    0.5  0.5
Good    0.5     1.0  0.0    0.5  0.5

【讨论】：