【发布时间】:2021-01-16 12:59:20
【问题描述】:
我正在尝试通过机器学习的示例脚本:Common pitfalls in interpretation of coefficients of linear models,但我无法理解其中的一些步骤。脚本的开头如下所示:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_openml
survey = fetch_openml(data_id=534, as_frame=True)
# We identify features `X` and targets `y`: the column WAGE is our
# target variable (i.e., the variable which we want to predict).
X = survey.data[survey.feature_names]
X.describe(include="all")
X.head()
# Our target for prediction is the wage.
y = survey.target.values.ravel()
survey.target.head()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
_ = sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
我的问题出在线条上
y = survey.target.values.ravel()
survey.target.head()
如果我们在这些行之后立即检查survey.target.head(),输出是
Out[36]:
0 5.10
1 4.95
2 6.67
3 4.00
4 7.50
Name: WAGE, dtype: float64
模型如何知道WAGE 是目标变量?不是必须显式声明吗?
【问题讨论】:
-
看一下调查的结构。它是 Pandas 数据框或以数据为属性的对象。
-
是的,它实际上是用
y = survey.target.values.ravel()明确声明的。变量 y 通常用作目标的符号。 -
是的,我理解变量
y,但它在哪里指定WAGES作为目标变量?我想我误解了survey.target.values.ravel()位。当我对自己的数据使用该方法时,如何指定目标是什么变量?
标签: python machine-learning scikit-learn regression prediction