为什么 dropna() 和 replace() 方法不适用于数据框中的缺失数据？答案

【问题标题】：Why is dropna() and replace() methods not working for missing data in dataframe?为什么 dropna() 和 replace() 方法不适用于数据框中的缺失数据？
【发布时间】：2019-05-03 11:10:01
【问题描述】：

我从一门数据科学课程开始，该课程要求我通过删除“价格”子集中包含 NaN 的行或用某个平均值替换 NaN 来处理丢失的数据。但是我的 dropna() 和 replace() 似乎都不起作用。可能是什么问题？

我在 stackoverflow 上经历了很多解决方案，但我的问题没有解决。我还尝试通过 pandas.pydata.org 寻找解决方案，在那里我了解了 dropna() 的不同参数，例如 thresh、how='any' 等，但没有任何帮助。

import pandas as pd

import numpy as np


url="https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df=pd.read_csv(url,header=None)


'''
Our data comes without any header or column name,hence we assign each column a header name.
'''


headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style","drive-wheels","engnie-location","wheel-base","length","width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price"]
df.columns=headers


'''
Now that we have to eliminate rows containing NaN or ? in "price" column in our data
'''

df.dropna(subset=["price"], axis=0, inplace=True) 

df.head(12)

#or

df.dropna(subset=["price"], how='any') 

df.head(12)

#also to replace

mean=df["price"].mean()

df["price"].replace(np.nan,mean)

df.head(12)

预计所有行都包含 ig NaN 或“？”在要为 dropna() 删除或替换为 replace() 的“价格”列中。但是数据似乎没有变化。

【问题讨论】：

试试：df['price'] = df['price'].fillna(df['price'].mean())?替换方法也不会更改数据框，因此应将其分配回：df['price']=df["price"].replace(np.nan,mean)。同样对于dropna，除非您使用inplace，否则分配回。
您好，您能否在处理前打印出df.head(12)，或者打印出df.info() 以获取数据类型信息？
'drop.na() 是否接受 inplace 参数？如果是这样，这就是您需要将参数传递或重新分配给 df 的内容，就像 Mohit 上面所说的那样。顺便说一句，好问题！确保您下次向我们提供您的数据样本和预期输出。
@Datanovice 第一个df.dropna 已经有inplace=True。
您可以发布示例数据和预期输出吗？

标签： python python-3.x pandas dataframe missing-data

【解决方案1】：

请使用此代码删除？值如下：

df['price'] = pd.to_numeric(df['price'], errors='coerce')
df = df.dropna()

to_numeric 方法将参数转换为数值类型。

并且，coerce 将无效设置为 NaN。

那么，dropna 可以清除包含 NaN 的记录。

【讨论】：