【问题标题】:pandas python Drop last row of grouppandas python 删除组的最后一行
【发布时间】:2020-10-07 04:09:41
【问题描述】:

我需要删除每个组的最后一个成员,因为它会打乱进一步的计算。我不知道如何更好地解释我的问题,但如果您需要进一步说明,请询问。

我当前的代码:

 sampleDataUser = sampleData.groupby('user').filter(lambda x: x != sampleDataUser.tail(1))

返回此错误:

  ValueError: Can only compare identically-labeled DataFrame objects

样本数据:

df = [{ "user" : "seth", var1 = "5"}, {"user": "seth", "var1" : "8"}, {"user" : "chris", "var1" : "2"}]

预期输出:

df = [{ "user" : "seth", var1 = "5"}, {"user" : "chris", "var1" : "2"}]

【问题讨论】:

  • 也许您只想保留每个用户的第一行:df.drop_duplicates('user', keep='first')?

标签: python pandas pandas-groupby


【解决方案1】:

若要删除user 的最后一行,如果重复,请使用Series.duplicated 链接| 按位OR 进行掩码并按boolean indexing 过滤:

df = pd.DataFrame([{ "user" : "seth", "var1" : "50"},
                   { "user" : "seth", "var1" : "5"}, 
                   {"user": "seth", "var1" : "8"}, 
                   {"user" : "chris", "var1" : "2"}])
print (df)
    user var1
0   seth   50
1   seth    5
2   seth    8
3  chris    2

df = df[df['user'].duplicated(keep='last') | ~df['user'].duplicated(keep=False)]
print (df)
    user var1
0   seth   50
1   seth    5
3  chris    2

详情

print (df.assign(m1 = df['user'].duplicated(keep='last'),
                 m2 = ~df['user'].duplicated(keep=False),
                 both = df['user'].duplicated(keep='last') | 
                       ~df['user'].duplicated(keep=False)))
    user var1     m1     m2   both
0   seth   50   True  False   True
1   seth    5   True  False   True
2   seth    8  False  False  False
3  chris    2  False   True   True

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2022-10-13
    • 2019-09-20
    • 1970-01-01
    • 2010-12-25
    • 2012-06-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多