【发布时间】:2020-06-20 12:18:52
【问题描述】:
为了使线性回归模型适合一些给定的训练数据 X 和标签 y,我想通过给定特征的非线性变换来增强我的训练数据 X。 假设我们有特征 x1、x2 和 x3。 我们想使用额外的转换特征:
x4 = x12, x5 = x22 和 x6 = x32
x7 = exp(x1), x8 = exp(x2) 和 x 9 = exp(x3)
x10 = cos(x1), x11 = cos(x2) 和 x 12 = cos(x3)
我尝试了以下方法,但是导致模型在作为评估标准的均方根误差方面表现非常差:
import pandas as pd
import numpy as np
from sklearn import linear_model
#import the training data and extract the features and labels from it
DATAPATH = 'train.csv'
data = pd.read_csv(DATAPATH)
features = data.drop(['Id', 'y'], axis=1)
labels = data[['y']]
features['x6'] = features['x1']**2
features['x7'] = features['x2']**2
features['x8'] = features['x3']**2
features['x9'] = np.exp(features['x1'])
features['x10'] = np.exp(features['x2'])
features['x11'] = np.exp(features['x3'])
features['x12'] = np.cos(features['x1'])
features['x13'] = np.cos(features['x2'])
features['x14'] = np.cos(features['x3'])
regr = linear_model.LinearRegression()
regr.fit(features, labels)
我是 ML 的新手,肯定有更好的选择来进行这些非线性特征转换,非常高兴能得到您的帮助。
干杯卢卡斯
【问题讨论】:
-
我的直觉是
np.exp项比数据集中的其他项要大得多,因此您的回归只适合它们。您可以通过在训练分类器之前对数据进行规范化来避免这种情况。查看this post
标签: python pandas numpy machine-learning regression