【发布时间】:2020-10-25 23:22:26
【问题描述】:
我正在开发我的第一个管道,但我无法让它在 Titanic 数据集上工作。有人可以解释一下我做错了什么以及如何解决吗?
我从数据框中删除了一些特征,并使用 get dummies 来转换分类特征。
titanic_dummies = titanic.copy()
titanic_dummies = titanic_dummies.drop([ 'Name', 'Ticket','Cabin', "Fare"], axis=1)
titanic_dummies = pd.get_dummies(titanic_dummies, drop_first=True)
然后我尝试运行这个管道
X=titanic_dummies.drop(['Survived'], axis=1)
y=titanic_dummies['Survived']
****#setup the pipeline steps****
steps = [('scaler', StandardScaler()),
('imputation', SimpleImputer(missing_values='NaN', strategy='most_frequent')),
('logreg', LogisticRegression())]
*# Create the pipeline: pipeline*
pipeline = Pipeline(steps)
#Define hyperparameters and range of Grid Search
parameters = {"logreg__C": np.logspace(-5, 8, 15),
"logreg__penalty": ['l1', 'l2']}
*# Create train and test sets*
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
*# run cross validation*
cv = GridSearchCV(pipeline, param_grid = parameters, cv=3)
*# Fit the pipeline to the training set:*
cv.fit(X_train, y_train)
*# Predict the labels of the test set*
y_pred = cv.predict(X_test)
*# Compute and print metrics*
print("Accuracy: {}".format(cv.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))
这是我得到的错误
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
从所附图片中,您可能可以看出我的值的大小不是问题。也许我的估算有问题?
我真的很想听听您对如何解决此问题的想法。
【问题讨论】:
标签: python pipeline valueerror