从列中删除 NaN 值时会发生什么？答案

【问题标题】：What happens when removing NaN values from a column?从列中删除 NaN 值时会发生什么？
【发布时间】：2020-07-11 04:00:31
【问题描述】：

在 Pandas 中，从列中删除 NaN 值后，存储在删除 NaN 值的索引处的值是多少？我能够成功地从列中删除 NaN 值，但 df 的形状是完整的，但该特定列的大小发生了变化。

1445    70.0
**1446     NaN**
1447    80.0
1448    70.0
1449    21.0
1450    60.0
1451    78.0
1452    35.0
1453    90.0
1454    62.0
1455    62.0
1456    85.0
1457    66.0
1458    68.0
1459    75.0
Name: LotFrontage, dtype: float64
Size of LotFrontage before removing NaN values: 1460

这是删除 NaN 值后得到的结果

1444    63.0
1445    70.0
1447    80.0
1448    70.0
1449    21.0
1450    60.0
1451    78.0
1452    35.0
1453    90.0
1454    62.0
1455    62.0
1456    85.0
1457    66.0
1458    68.0
1459    75.0
Name: LotFrontage, dtype: float64
New size of LotFrontage after removing NaN values: 1201

尝试分配索引 1446 的值时出现以下错误：

[在此处输入图片描述][1]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-70-7cb9d14fb3e0> in <module>()
      3 print("New size of LotFrontage after revoving NaN values: " + str(iowa['LotFrontage'].size))
      4 print(iowa['LotFrontage'][1445])
----> 5 print(iowa['LotFrontage'][1446])

1 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4403         k = self._convert_scalar_indexer(k, kind="getitem")
   4404         try:
-> 4405             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4406         except KeyError as e1:
   4407             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1446

【问题讨论】：

我认为您的意思是“在 Pandas 中”而不是“在 Python 中”，对吧？如果是这样，请相应地编辑您的问题和标签。
是的。我是说熊猫

标签： python nan

【解决方案1】：

我假设您必须使用“dropna”函数来删除 NaN 值。您可以使用“dropna”功能以各种方式丢弃。默认情况下，如果该行中的任何列在其中获得“NaN”值，它会逐行删除并删除行。您可以通过设置各种参数来更改此行为，可以参考here。

当行被删除时，形状肯定会改变。在你的情况下，形状一定没有改变，因为你“没有掉在原地”。如果不将 'inplace' 设置为 'True'，“dropna”函数将返回您删除的数据帧，而不是在原始数据帧中更改它。

如果删除索引是可取的行为，则使用 dropna 任一方式：

df_final = df.dropna()
or
df.dropna(inplace=True)

如果您的数据框中有多个列，并且只想在所有列都有 NaN 时删除行，则使用：

df_final = df.dropna(how='all')
or
df.dropna(how='all', inplace=True)

如果您只有一列并且想要保护索引，那么您可以尝试将 NaN 值替换为合适的值，例如：

df_final = df.fillna(0)
or
df.fillna(value=0, inplace=True)

有关“fillna”的更多信息，您可以参考link。

【讨论】：

【解决方案2】：

第一列，它只是一个索引。您应该在删除一些值后重置索引。（如果要查看旧索引或将旧索引添加到数据框，请设置 drop=False。否则，它将删除旧索引）

df2 = df2.reset_index(drop=True)

删除一些值后您的数据框仅包含 1201 行，因此 1446 不再存在行。这就是为什么你得到 KeyError: 1446

【讨论】：