【发布时间】:2021-07-30 22:28:27
【问题描述】:
为了澄清这个问题,我使用了一些数据集来解释二维数据的变体
数据集可以访问:https://drive.google.com/file/d/14-VivVlGSlaJo6BXlYMqn-1leorSU6ET/view?usp=sharing
也是一个辅助函数:
scatterplot_check <- function(data, dependent_col, x_column, y_column, legend_pos="topright"){
x11()
data_subsets <- data[,c(which(colnames(data) %in% c(dependent_col, x_column, y_column)))]
if(class(data_subsets[[dependent_col]]) == "factor"){
factor_key <- levels(data_subsets[[dependent_col]])
data_subsets[[dependent_col]] <- as.numeric(data_subsets[[dependent_col]])
factor_num <- sort(unique(data_subsets[[dependent_col]]))
plot(data_subsets[[x_column]],data_subsets[[y_column]],
col = data_subsets[[dependent_col]], pch=18,
xlab=x_column, ylab=y_column)
legend(legend_pos, legend=factor_key, col = factor_num, pch=18)
}
else if(class(data_subsets[[dependent_col]]) == "character"){
data_subsets[[dependent_col]] <- as.factor(data_subsets[[dependent_col]])
factor_key <- levels(data_subsets[[dependent_col]])
data_subsets[[dependent_col]] <- as.numeric(data_subsets[[dependent_col]])
factor_num <- sort(unique(data_subsets[[dependent_col]]))
plot(data_subsets[[x_column]],data_subsets[[y_column]],
col = data_subsets[[dependent_col]], pch=18,
xlab=x_column, ylab=y_column)
legend(legend_pos, legend=factor_key, col = factor_num, pch=18)
}
else if(class(data_subsets[[dependent_col]]) == "integer"){
if(min(data_subsets[[dependent_col]]) == 0){
data_subsets[[dependent_col]] <- data_subsets[[dependent_col]] + 1
plot(data_subsets[[x_column]],data_subsets[[y_column]],
col = data_subsets[[dependent_col]], pch=18,
xlab=x_column, ylab=y_column)
legend(legend_pos, legend=sort(unique(data_subsets[[dependent_col]]-1)),
col = sort(unique(data_subsets[[dependent_col]])), pch=18)
}else{
plot(data_subsets[[x_column]],data_subsets[[y_column]],
col = data_subsets[[dependent_col]], pch=18,
xlab=x_column, ylab=y_column)
legend(legend_pos, legend=sort(unique(data_subsets[[dependent_col]])),
col = sort(unique(data_subsets[[dependent_col]])), pch=18)
}
}
}
假设,我将所有数据读入环境:
dataset1 <- read.csv("dataset1.csv")
dataset2 <- read.csv("dataset2.csv")
dataset3 <- read.csv("dataset3.csv")
这里是散点图的一些变体:
scatterplot_check(dataset1, "y","x.1","x.2")
scatterplot_check(dataset2, "Purchased","Age","EstimatedSalary")
scatterplot_check(dataset3, "grades","english","math")
scatterplot_check(dataset3, "grades","read","math", legend_pos="topleft")
是否有任何最佳方法来计算使用 SVM 模型建模的 2D 散点图的可能性?提前谢谢你
【问题讨论】:
-
是什么让您猜测前两个示例可能分类得很好,而其他两个则不太可能?
-
@Kota Mori 好吧,我的猜测是,通过查看其分布,最后两个示例有 1 个类几乎超过/(成为其他类的一部分)。我并不是说它不可能最后用这 2 个来制作 SVM,我认为它更难,因为它不常见
标签: r svm scatter-plot