【发布时间】:2015-08-16 20:44:54
【问题描述】:
我为 2015 年 FIFA 女足世界杯汇总了一些数据:
import pandas as pd
df = pd.DataFrame({
'team':['Germany','USA','France','Japan','Sweden','England','Brazil','Canada','Australia','Norway','Netherlands','Spain',
'China','New Zealand','South Korea','Switzerland','Mexico','Colombia','Thailand','Nigeria','Ecuador','Ivory Coast','Cameroon','Costa Rica'],
'group':['B','D','F','C','D','F','E','A','D','B','A','E','A','A','E','C','F','F','B','D','C','B','C','E'],
'fifascore':[2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589],
'ftescore':[95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8]
})
df.groupby(['group', 'team']).mean()
现在我想生成一个新的数据框,其中包含来自df 的每个group 中的 6 个可能的配对或匹配,格式如下:
group team1 team2
A Canada China
A Canada Netherlands
A Canada New Zealand
A China Netherlands
A China New Zealand
A Netherlands New Zealand
B Germany Ivory Coast
B Germany Norway
...
有什么简洁明了的方法来做到这一点?我可以通过每个group 和team 执行一堆循环,但我觉得应该有一个更清晰的矢量化方式来使用pandas 和split-apply-combine 范例。
编辑:我也欢迎任何 R 答案,认为在这里比较 R 和 Pandas 方式会很有趣。添加了r 标签。
这是 R 形式的数据,根据评论中的要求:
team <- c('Germany','USA','France','Japan','Sweden','England','Brazil','Canada','Australia','Norway','Netherlands','Spain',
'China','New Zealand','South Korea','Switzerland','Mexico','Colombia','Thailand','Nigeria','Ecuador','Ivory Coast','Cameroon','Costa Rica')
group <- c('B','D','F','C','D','F','E','A','D','B','A','E','A','A','E','C','F','F','B','D','C','B','C','E')
fifascore <- c(2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589)
ftescore <- c(95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8)
df <- data.frame(team, group, fifascore, ftescore)
【问题讨论】:
-
组
pd.DataFrame(({grp: tuple(combinations(team, 2)) for grp, team in df.groupby("group")["team"]}))访问可能会更好
标签: python r pandas plyr split-apply-combine