【发布时间】:2016-04-12 15:40:35
【问题描述】:
我是 Python 数据分析的初学者,但在完成这项特定任务时遇到了麻烦。我进行了相当广泛的搜索,但无法确定问题所在。
我导入了一个文件并将其设置为数据框。清理文件中的数据。但是,当我尝试将我的模型拟合到数据时,我得到一个
检测到完美分离,结果不可用
代码如下:
from scipy import stats
import numpy as np
import pandas as pd
import collections
import matplotlib.pyplot as plt
import statsmodels.api as sm
loansData = pd.read_csv('https://spark- public.s3.amazonaws.com/dataanalysis/loansData.csv')
loansData = loansData.to_csv('loansData_clean.csv', header=True, index=False)
## cleaning the file
loansData['Interest.Rate'] = loansData['Interest.Rate'].map(lambda x: round(float(x.rstrip('%')) / 100, 4))
loanlength = loansData['Loan.Length'].map(lambda x: x.strip('months'))
loansData['FICO.Range'] = loansData['FICO.Range'].map(lambda x: x.split('-'))
loansData['FICO.Range'] = loansData['FICO.Range'].map(lambda x: int(x[0]))
loansData['FICO.Score'] = loansData['FICO.Range']
#add interest rate less than column and populate
## we only care about interest rates less than 12%
loansData['IR_TF'] = pd.Series('', index=loansData.index)
loansData['IR_TF'] = loansData['Interest.Rate'].map(lambda x: True if x < 12 else False)
#create intercept column
loansData['Intercept'] = pd.Series(1.0, index=loansData.index)
# create list of ind var col names
ind_vars = ['FICO.Score', 'Amount.Requested', 'Intercept']
#define logistic regression
logit = sm.Logit(loansData['IR_TF'], loansData[ind_vars])
#fit the model
result = logit.fit()
#get fitted coef
coeff = result.params
print coeff
任何帮助将不胜感激!
谢谢, 一个
【问题讨论】:
标签: python numpy pandas matplotlib logistic-regression