【发布时间】:2020-11-01 17:10:58
【问题描述】:
问题
为什么使用分类索引时使用pd.Series.sort_index 的排序似乎不起作用?如何使用除字母/数字以外的其他排序顺序对多索引 pd.Series 的索引进行排序?
MWE
设置代码
import pandas as pd
import numpy as np
d = {
'Card': [
'Visa', 'Visa', 'Master Card', 'Master Card', 'Visa', 'Master Card',
'Visa', 'Visa', 'Master Card', 'Visa', 'Master Card', 'Visa', 'Visa',
'Master Card', 'Master Card', 'Visa', 'Master Card', 'Visa', 'Visa',
'Master Card', 'Visa', 'Master Card', 'Master Card', 'Master Card',
'Master Card', 'Master Card', 'Master Card', 'Visa', 'Visa'
],
'Year': [
'Three', 'Three', 'Seven', 'Three', 'Three', 'Seven', 'Seven', 'Seven',
'Three', 'Seven', 'Three', 'Three', 'Three', 'Seven', 'Three', 'Three',
'Seven', 'Seven', 'Seven', 'Three', 'Seven', 'Three', 'Five', 'One',
'One', 'Two', 'Four', 'Six', 'Six'
],
'Value': [
45, 13, 52, 321, 31, 1231, 876, 231, 4, 213, 123, 45, 321, 1, 123, 52,
736, 35, 900, 301, 374, 9, 294, 337, 4465, 321, 755, 22, 8
]
}
df = pd.DataFrame(d)
grp_cols = ['Card', 'Year']
ser_val = df.groupby(grp_cols)['Value'].mean()
简单地使用sort_index,数据看起来像这样:
In [2]: ser_val.sort_index()
Out[2]:
Card Year
Master Card Five 294.000000
Four 755.000000
One 2401.000000
Seven 505.000000
Three 146.833333
Two 321.000000
Visa Seven 438.166667
Six 15.000000
Three 84.500000
Name: Value, dtype: float64
您可以看到列按字母顺序排序。现在,我想强制订购。为此,我尝试:
categories_order = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven']
categories = pd.Categorical(ser_val.index.levels[1].values,
categories=categories_order,
ordered=True)
ser_val.index.set_levels(categories, level='Year', inplace=True)
再次,排序后,数据看起来像这样(再次,按字母顺序)
In [3]: ser_val.sort_index()
Out[3]:
Card Year
Master Card Five 294.000000
Four 755.000000
One 2401.000000
Seven 505.000000
Three 146.833333
Two 321.000000
Visa Seven 438.166667
Six 15.000000
Three 84.500000
Name: Value, dtype: float64
我知道如果我将数据转换成 pandas.DataFrame 并在那里排序,它可以工作,如下所示:
df_val = ser_val.reset_index().sort_values(grp_cols)
df_val['Year'] = pd.Categorical(df_val['Year'].values,
categories_order,
ordered=True)
df_val = df_val.sort_values(grp_cols).set_index(grp_cols)
In [5]: df_val
Out[5]:
Value
Card Year
Master Card One 2401.000000
Two 321.000000
Three 146.833333
Four 755.000000
Five 294.000000
Seven 505.000000
Visa Three 84.500000
Six 15.000000
Seven 438.166667
为什么 pd.Series 不使用分类数据排序?
我在 Python 3.7.3 64 位中使用 pandas 1.0.5
【问题讨论】: