pandas drop_duplicates 不可散列的类型：'numpy.ndarray'、'set' 和 'list'

【问题标题】：pandas drop_duplicates unhashable type: 'numpy.ndarray', 'set' and 'list'pandas drop_duplicates 不可散列的类型：'numpy.ndarray'、'set' 和 'list'
【发布时间】：2017-10-26 15:32:28
【问题描述】：

我正在尝试在 dataframe 的列上使用 drop_duplicates，

A          len
['1', '2'] 2
['1', '2'] 2
['3']      1
['4', '5'] 2 
['4', '5'] 2

dataframe 的结果应该是这样的

A          len
['1', '2'] 2
['3']      1
['4', '5'] 2

我试过df.drop_duplicates('A', inplace=True)，但出错了，

unhashable type: 'numpy.ndarray'

我还使用df['A'].apply(list) 和df['A'].apply(set) 将A 转换为列表和集合，然后使用drop_duplicates，但都以unhashable type: 'set' and 'list' 失败。我想知道如何解决这个问题。

【问题讨论】：

标签： python-3.x pandas dataframe

【解决方案1】：

你需要tuple:

df['A'].apply(tuple)

所以使用duplicated 和boolean indexing：

df = df[~df['A'].apply(tuple).duplicated()]
print (df)
        A  len
0  [1, 2]    2
2     [3]    1
3  [4, 5]    2

【讨论】：

很明显它可以工作，但我不清楚为什么它需要在一个元组中转换提前谢谢你:)
@AndreaCiufo - 你可以查看this 来解释为什么有些类型不能像list, array 这样unhashable type
有趣的是，DataFrame 复制方法与 Series 复制方法的工作方式不同。我现在可以在您的示例中使用 list 和 set 来获取工作代码，例如df[~df['A'].apply(list).duplicated()] 正在工作，但在 DataFrame 上使用了类似的方法，例如df["A"].apply(list).to_frame("A").duplicated("A")，会失败。