如何选择特征并使用支持向量机算法进行训练？答案

【问题标题】：How can I select the features and train with support vector machine algorithms?如何选择特征并使用支持向量机算法进行训练？
【发布时间】：2021-04-20 16:17:35
【问题描述】：

我有心脏数据集，其中包括年龄、性别、cp、trestbps、chol、fbs、retecg、thalach、exang、oldpeak、斜率、ca、thal、目标变量等特征。每个数据都有数值。

我想用支持向量机算法训练数据。

#read data
setwd("C:/Users/sevvalayse.yurtekin/Desktop/SevvalAyse_Yurtekin")
data_heart = read.csv("heart_disease_dataset.csv", header = T, sep = ",")
data_heart

#split randomly test and train data. 75% train, 25% test.
ind<- sample(2, nrow(data_heart), replace = T, prob = c(0.75,0.25))
train<-data_heart[ind==1, ]
test<-data_heart[ind==2, ]

classifier = svm(formula = age ~.,
                 data = train,
                 type = 'C-classification',
                 kernel = 'linear')
classifier

这是我的代码。我拆分数据。但是我该如何训练呢？我如何决定功能？或者我可以使用所有功能吗？你能帮帮我吗？

【问题讨论】：

标签： r machine-learning classification svm feature-selection

【解决方案1】：

在您提供的代码中，您使用所有其他变量来预测年龄，这就是 formula = age ~. 的意思。

有两种方法可以指定特征和目标变量，你可以使用公式参数，例如：

classifier <- svm(formula = target ~ age + sex + cp + trestbps + chol + fbs,
                 data = train,
                 type = 'C-classification',
                 kernel = 'linear')

或者您将功能和目标分别作为svm 的x 和y 参数（请参阅documentation）提供，例如：

classifier <- svm(x = train[,c('age', 'sex', 'cp', 'trestbps', 'chol', 'fbs')],
                 y = train[,'target'],
                 type = 'C-classification',
                 kernel = 'linear')

如果您想使用所有功能，您可以像以前一样使用快捷方式target ~ .，但在您的版本中，目标变量是age。

模型在调用svm 函数时进行训练。之后，您可能想要预测测试集上的目标变量：

predicted <- predict(classifier, test)

请注意，这里我提供了完整的数据框 test，因为它更方便，但当然不需要 target 列。

【讨论】：