【问题标题】:Pandas drop_duplicates - TypeError: type object argument after * must be a sequence, not mapPandas drop_duplicates - TypeError:*后的类型对象参数必须是序列,而不是映射
【发布时间】:2016-10-14 01:31:43
【问题描述】:

我更新了我的问题以提供更清晰的示例。

是否可以在 Pandas 中使用 drop_duplicates 方法根据包含列表的列 ID 删除重复行。考虑由列表中的两个项目组成的列“三”。有没有办法删除重复的行而不是迭代地进行(这是我目前的解决方法)。

我通过提供以下示例概述了我的问题:

import pandas as pd

data = [
{'one': 50, 'two': '5:00', 'three': 'february'}, 
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 90, 'two': '9:00', 'three': 'january'}
]

df = pd.DataFrame(data)

print(df)

   one                three   two
0   50             february  5:00
1   25  [february, january]  6:00
2   25  [february, january]  6:00
3   25  [february, january]  6:00
4   90              january  9:00

df.drop_duplicates(['three'])

导致以下错误:

TypeError: type object argument after * must be a sequence, not map

【问题讨论】:

  • 你想要df_two = df_one.drop_duplicates('ID') 或者特别是df_two = df_one.drop_duplicates(subset=['ID'])
  • 恐怕还没有解决问题。仍然看到同样的错误
  • df_two = df_one.drop_duplicates() 也有效吗?
  • 很遗憾没有,得到同样的错误
  • 您必须发布原始数据和重现此错误的代码,因为这似乎不是问题

标签: python pandas dataframe


【解决方案1】:

我认为这是因为列表类型不是可散列的,这会破坏重复的逻辑。作为一种解决方法,您可以像这样转换为元组:

df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x)
df.drop_duplicates('four')

   one                three   two                 four
0   50             february  5:00             february
1   25  [february, january]  6:00  (february, january)
4   90              january  9:00              january

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-02-03
    • 2016-05-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-01-14
    • 2021-01-27
    相关资源
    最近更新 更多