【发布时间】:2022-09-27 15:43:55
【问题描述】:
我可以使用下面的代码轻松训练和测试分类器。
import pandas as pd
import numpy as np
# Load Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier# Step1: Create data set
# Define the headers since the data does not have any
headers = [\"symboling\", \"normalized_losses\", \"make\", \"fuel_type\", \"aspiration\",
\"num_doors\", \"body_style\", \"drive_wheels\", \"engine_location\",
\"wheel_base\", \"length\", \"width\", \"height\", \"curb_weight\",
\"engine_type\", \"num_cylinders\", \"engine_size\", \"fuel_system\",
\"bore\", \"stroke\", \"compression_ratio\", \"horsepower\", \"peak_rpm\",
\"city_mpg\", \"highway_mpg\", \"price\"]
# Read in the CSV file and convert \"?\" to NaN
df = pd.read_csv(\"https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data\",
header=None, names=headers, na_values=\"?\" )
df.head()
df.columns
df_fin = pd.DataFrame({col: df[col].astype(\'category\').cat.codes for col in df}, index=df.index)
df_fin
X = df_fin[[\'symboling\', \'normalized_losses\', \'make\', \'fuel_type\', \'aspiration\',
\'num_doors\', \'body_style\', \'drive_wheels\', \'engine_location\',
\'wheel_base\', \'length\', \'width\', \'height\', \'curb_weight\', \'engine_type\',
\'num_cylinders\', \'engine_size\', \'fuel_system\', \'bore\', \'stroke\',
\'compression_ratio\', \'horsepower\', \'peak_rpm\']]
y = df_fin[\'city_mpg\']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit a Decision Tree model
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)
现在,如何根据自变量对目标变量(因变量)进行预测?
像这样的东西应该可以工作,我认为,但它不...
clf.predict([[2,164,\'audi\',\'gas\',\'std\',\'four\',\'sedan\',\'fwd\',\'front\',99.8,176.6,66.2,54.3,2337,\'ohc\',\'four\',109,\'mpfi\',3.19,3.4,10,102,5500,24,30,13950,]])
如果我们将数字保留为数字,并在标签周围加上引号,我想预测因变量,但我不能,因为标签数据。如果数据都是数字,这是一个回归问题,它会工作!!我的问题是......我们如何输入数字和标签,就像一个真实的人会理解的那样,而不是使用标签转换成的数字。我必须相信,在训练和测试完成之前,标签会被转换成数字(一种热编码、分类代码或其他),对吧。
这是我收到的错误消息。
clf.predict([[2,164,\'audi\',\'gas\',\'std\',\'four\',\'sedan\',\'fwd\',\'front\',99.8,176.6,66.2,54.3,2337,\'ohc\',\'four\',109,\'mpfi\',3.19,3.4,10,102,5500,24,30,13950,]])
C:\\Users\\ryans\\anaconda3\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names
warnings.warn(
Traceback (most recent call last):
Input In [20] in <cell line: 1>
clf.predict([[2,164,\'audi\',\'gas\',\'std\',\'four\',\'sedan\',\'fwd\',\'front\',99.8,176.6,66.2,54.3,2337,\'ohc\',\'four\',109,\'mpfi\',3.19,3.4,10,102,5500,24,30,13950,]])
File ~\\anaconda3\\lib\\site-packages\\sklearn\\tree\\_classes.py:505 in predict
X = self._validate_X_predict(X, check_input)
File ~\\anaconda3\\lib\\site-packages\\sklearn\\tree\\_classes.py:471 in _validate_X_predict
X = self._validate_data(X, dtype=DTYPE, accept_sparse=\"csr\", reset=False)
File ~\\anaconda3\\lib\\site-packages\\sklearn\\base.py:577 in _validate_data
X = check_array(X, input_name=\"X\", **check_params)
File ~\\anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py:856 in check_array
array = np.asarray(array, order=order, dtype=dtype)
ValueError: could not convert string to float: \'audi\'
-
请发布回溯
标签: python machine-learning scikit-learn data-science classification