Pandas 中的 SQL 选择和分组依据

【问题标题】：SQL Select and Group by in PandasPandas 中的 SQL 选择和分组依据
【发布时间】：2021-07-09 02:47:57
【问题描述】：

Track   Actor                  Movie
1       Katherine Hepburn      Guess Who's Coming to Dinner
2       Katherine Hepburn      Guess Who's Coming to Dinner
3       Katherine Hepburn      On Golden Pond
4       Katherine Hepburn      The Lion in Winter
5       Bette Davis            What Ever Happened to Baby Jane?
6       Bette Davis            The Letter
7       Bette Davis            The Letter
...
100     Omar Shariff           Lawrence of Arabia

需要在 python 中编写代码来选择所有出演过不止一部电影的演员并将他们的名字附加到一个列表中。

以下 SQL 查询的 Python 等效项。

SELECT Actor, count(DISTINCT Movie)
FROM table
GROUP by Actor
HAVING count(DISTINCT Movie) > 1

【问题讨论】：

标签： python-3.x pandas sqlite group-by pandas-groupby

【解决方案1】：

您可以使用drop_duplicates() 方法获取DISTINCT 电影值：

df=df.drop_duplicates(subset=['Actor','Movie'])

现在对于分组和聚合使用 groupby() 方法并将 agg() 方法链接到它：

result=df.groupby('Actor').agg(count=('Movie','count'))

最后使用布尔掩码并检查您的条件（count>1）：

result=result[result['count']>1]

【讨论】：