【发布时间】:2017-11-11 01:18:25
【问题描述】:
我正在练习贷款预测练习问题,并尝试在我的数据中填充缺失值。我从here 获得数据。为了完成这个问题,我关注了这个tutorial。
您可以在 GitHub 上找到我正在使用的整个代码(文件名 model.py)和数据 here。
DataFrame 如下所示:
df[['Loan_ID', 'Self_Employed', 'Education', 'LoanAmount']].head(10)
Out:
Loan_ID Self_Employed Education LoanAmount
0 LP001002 No Graduate NaN
1 LP001003 No Graduate 128.0
2 LP001005 Yes Graduate 66.0
3 LP001006 No Not Graduate 120.0
4 LP001008 No Graduate 141.0
5 LP001011 Yes Graduate 267.0
6 LP001013 No Not Graduate 95.0
7 LP001014 No Graduate 158.0
8 LP001018 No Graduate 168.0
9 LP001020 No Graduate 349.0
最后一行执行后(对应model.py文件中的第60行)
url = 'https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'
df = pd.read_csv(url)
df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
df['Self_Employed'].fillna('No',inplace=True)
table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
# Define function to return value of this pivot_table
def fage(x):
return table.loc[x['Self_Employed'],x['Education']]
# Replace missing values
df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)
我收到此错误:
ValueError Traceback (most recent call last)
<ipython-input-40-5146e49c2460> in <module>()
----> 1 df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)
/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
2368 axis=axis, inplace=inplace,
2369 limit=limit, downcast=downcast,
-> 2370 **kwargs)
2371
2372 @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in fillna(self, value, method, axis, inplace, limit, downcast)
3264 else:
3265 raise ValueError("invalid fill value with a %s" %
-> 3266 type(value))
3267
3268 new_data = self._data.fillna(value=value, limit=limit,
ValueError: invalid fill value with a <class 'pandas.core.frame.DataFrame'>
如何在不出现此错误的情况下填充缺失值?
【问题讨论】:
-
df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()]这没有意义。您正在寻找空值并试图用空值填充空值? -
@ayhan 我按照教程中的方式进行操作,我认为它应该用 true 填充缺失值
-
对不起,它试图用
df[df['LoanAmount'].isnull()].apply(fage, axis=1)填充你能包括函数 fage 定义和一个小的可重现数据集吗? -
@ayhan 我已经给出了我正在使用的整个代码的链接,但以防万一这里是 def fage(x): ...: return table.loc[x['Self_Employed'] ,x['教育']]
-
@ayhan 至于数据集,它也在我的 github 链接上有问题,数据很小,你可以从那里下载
标签: python python-2.7 pandas machine-learning