【发布时间】:2018-10-31 14:39:46
【问题描述】:
我正在使用来自以下位置的婴儿姓名数据集: https://raw.githubusercontent.com/hadley/data-baby-names/master/baby-names.csv
其中的数据看起来像:
"year","name","percent","sex"
1880,"John",0.081541,"boy"
1880,"William",0.080511,"boy"
1880,"James",0.050057,"boy"
1880,"Charles",0.045167,"boy"
1880,"George",0.043292,"boy"
1880,"Frank",0.02738,"boy"
1880,"Joseph",0.022229,"boy"
我将所有名字组合在一起,并将男孩和女孩的百分比相加:
data1.groupby(['name','sex'])[['percent']].sum()
这会创建一个多索引数据框:
Name Sex Percent
Aaron boy 0.292292
girl 0.000805
Abagail girl 0.001326
Abbie boy 0.000092
girl 0.022804
对于每个名字,我想在一个新的数据框中返回更高百分比的性别:
Name Sex Percent
Aaron boy 0.292292
Abagail girl 0.001326
Abbie girl 0.022804
我一直在查看multi-index documentation,但无法弄清楚这一点。任何帮助表示赞赏。
【问题讨论】:
标签: python pandas multi-index