Pandas 列的列表理解结果：不可散列类型：'dict'答案

【问题标题】：List Comprehension of Pandas Columns Results in: unhashable type: 'dict'Pandas 列的列表理解结果：不可散列类型：'dict'
【发布时间】：2019-03-10 09:12:41
【问题描述】：

我已经下载了一个 Kaggle Kernel 作为 Jupyter Notebook 文件，我试图在我的本地系统上运行它。内核在 Kaggle 上运行良好。但是，当我尝试将以下行（在单元格 4 中）作为 .ipynb 文件运行时，它会引发错误：

cols_to_drop = [col for col in train_df.columns if train_df[col].nunique(dropna=False) == 1]

返回的错误是：

TypeError: unhashable type: 'dict'

基于此 Stack Overflow question，我了解字典不能用作另一个字典中的键。但是，我无法确定哪一段代码实际上代表了字典。

我已经尝试了几个替代版本的代码，基于此article 列表理解中的格式。

new_list = [expression(i) for i in old_list if filter(i)]

但是，它们会产生相同的错误。

【问题讨论】：

标签： python pandas numpy dictionary list-comprehension

【解决方案1】：

pd.Series.nunique 在后台调用pd.Series.unique：

def nunique(self, dropna=True):
    uniqs = self.unique()
    n = len(uniqs)
    if dropna and isna(uniqs).any():
        n -= 1
    return n

pd.Series.unique 使用散列，很像 Python 的内置 set 底层：

基于哈希表的唯一性，因此不排序。

train_df 中的一个系列中的至少一个值包含字典。字典不可散列。因此，您将看到TypeError: unhashable type: 'dict'。

要查看哪些系列包含哪些类型，您可以使用字典推导：

type_dict = {col: set(map(type, train_df[col].values)) for col in train_df}

这是一个简单的例子：

df = pd.DataFrame({'A': [1, 'a', 'b', 4, {'some_dict': 3}], 'B': list(range(5))})
type_dict = {col: set(map(type, df[col].values)) for col in df}

print(type_dict)

{'A': {dict, int, str}, 'B': {numpy.int64}}

要使用nunique 来计算唯一项，您需要清理数据以确保您的数据框不包含非哈希值。

【讨论】：