为什么在构建线性回归模型时会出现值错误？答案

【问题标题】：Why do I get Value error when building Linear Regression model?为什么在构建线性回归模型时会出现值错误？
【发布时间】：2019-07-18 05:23:03
【问题描述】：

我正在尝试为数据集构建线性回归模型。将数据拆分为训练和测试后，出现以下错误：

ValueError: 无法将字符串转换为浮点数：'?' 这是否意味着数据集中存在空值或浮点值？

由于我是 Python 新手，我不明白如何纠正这个问题。谁能帮我解决这个问题？

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import linear_model
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = ['ID Number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape', 'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli', 'Mitoses', 'Class'])
X = df.iloc[:, 0:9].values
y = df.iloc[:, 10].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 4)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
lr = linear_model.LinearRegression()
lr.fit(X_train, y_train)

【问题讨论】：

看起来其中一列的类型为object。输入X.dtype 并检查数据中每一列的数据类型。
是的，一列是数据类型“对象”。删除该列后，我得到了输出。谢谢

标签： python-3.x linear-regression

【解决方案1】：

您正在使用的 Breast-cancer-wisconsin.data 数据集有一些带有“？”的行作为第 7 列中的值。因此，当您创建 X 和 y 时，不要考虑带有 '?' 的行作为价值。

我希望这会有所帮助。

【讨论】：

是的。我删除了该列，然后再次进行分析，得到了输出。谢谢