hy1231
  • Step3 特征与标签构建

  1. 提取数据类型特征列名

numerical_cols = Train_data.select_dtypes(exclude = \'object\').columns
print(numerical_cols)

categorical_cols = Train_data.select_dtypes(include = \'object\').columns
print(categorical_cols)

  1. 构建训练和测试样本

##选择特征列
feature_cols = [col for col in numerical_cols if col not in [\'SaleID\',\'name\',\'regDate\',\'creatDate\',\'price\',\'model\',\'brand\',\'regionCode\',\'seller\']]
feature_cols = [col for col in feature_cols if \'Type\' not in col]

## 提前特征列,标签列构造训练样本和测试样本
X_data = Train_data[feature_cols]
Y_data = Train_data[\'price\']

X_test = TestA_data[feature_cols]

print(\'X train shape:\',X_data.shape)
print(\'X test shape:\',X_test.shape)

## 定义了一个统计函数,方便后续信息统计
def Sta_inf(data):
 print(\'_min\',np.min(data))
 print(\'_max:\',np.max(data))
 print(\'_mean\',np.mean(data))
  print(\'_ptp\',np.ptp(data))
 print(\'_std\',np.std(data))
 print(\'_var\',np.var(data))

  1. 统计标签的基本分布信息

print(\'Sta of label:\')
Sta_inf(Y_data)

## 绘制标签的统计图,查看标签分布
plt.hist(Y_data)
plt.show()
plt.close()

  1. 缺省值用-1填补

X_data = X_data.fillna(-1)
X_test = X_test.fillna(-1)

分类:

技术点:

相关文章:

  • 2021-05-03
  • 2021-07-19
  • 2021-11-24
  • 2022-01-03
  • 2021-12-28
  • 2022-02-14
  • 2021-05-26
  • 2022-01-28
猜你喜欢
  • 2021-12-17
  • 2021-09-28
  • 2022-12-23
  • 2021-05-26
  • 2021-06-23
  • 2021-07-15
  • 2022-12-23
相关资源
相似解决方案