pandas python的内部加入组答案

【问题标题】：inner join with group by pandas pythonpandas python的内部加入组
【发布时间】：2016-05-04 05:29:42
【问题描述】：

我有 2 个名为 geostat 和 ref 的数据框，数据框如下：

geostat:
      count percent  grpno. state code
0          14.78       1         CA
1           0.00       2         CA
2           8.80       3         CA
3           9.60       4         FL
4          55.90       4         MA
5           0.00       2         FL
6           0.00       6         NC
7           0.00       5         NC
8           6.90       1         FL
9          59.00       4         MA
res:
    grpno.  MaxOfcount percent
0       1               14.78
1       2                0.00
2       3                8.80
3       4               59.00
4       5                0.00
5       6                0.00

我想从数据框 geostat 和 res 列 res.Maxofcount percent = geostat.count percent AND res 的内部连接中选择第一个（res.Maxofcount percent）、res.grpno. 和 geostat.first（statecode）。 grpno。 = geostat.grpno。按 res.grpno 分组。

我想做这个 python pandas，我不知道如何通过 group by 进行内部加入。有人可以帮我吗？

输出数据框如下：

   FirstOfMaxOfState count percent  state pool number FirstOfstate code
0                            14.78                  1                CA
1                             0.00                  2                CA
2                             8.80                  3                CA
3                            59.00                  4                MA
4                             0.00                  5                NC
5                             0.00                  6                NC

注意：FIRST(Column name) 是一个访问函数，在 python 中应该与它等效吗？

已编辑：更改了输出数据框。

【问题讨论】：

请显示预期的输出，因为不清楚您想做什么。
如何对具有 2 个不同列名的 2 个数据框进行内部连接？
再次，请发布您想要的预期输出，以便您可以帮助我们帮助您。
我尝试使用 2 个数据帧 geostat 和 res 进行内部连接，列为“MaxOfcount percent”、“count percent”，但我收到了 KeyError。代码行是：pd.merge(res, geostat, how = 'inner join', on = ('MaxOfcount percent','count percent'))
我已经添加了输出数据框。

标签： python pandas group-by inner-join

【解决方案1】：

使用pandas.DataFrame.merge()

geostat.merge(res, left_on=['count percent', 'grpno.'], right_on=['MaxOfcount percent', 'grpno.'],how='inner')

   count percent  grpno. state code  MaxOfcount percent
0          14.78       1         CA               14.78
1           0.00       2         CA                0.00
2           0.00       2         FL                0.00
3           8.80       3         CA                8.80
4           0.00       6         NC                0.00
5           0.00       5         NC                0.00
6          59.00       4         MA               59.00

【讨论】：

谢谢 Stefan，现在我想对结果数据帧上的状态码进行分组？并获得“MaxOfcount percent”和“grpno”中的第一个。和第一个（状态码）？
我设法按 grpno 分组。并使用以下代码获取第一个计数百分比： geostat_query_query = geomerge.groupby('grpno.')['count percent'].first().reset_index() 如何在同一行中获取第一个状态码代码？？
Stefan 如何按一列分组并获得两列中的第一列？我不确定如何在python中做到这一点。你能帮忙吗？
你的意思是df.groupby('grpno.')['count percent', 'state code'].first().reset_index()？