【问题标题】:Plotting subsets with different shapes绘制不同形状的子集
【发布时间】:2018-02-14 14:58:09
【问题描述】:

快速问题:我有一个数据集,我将其随机分为训练子集和测试子集。然后我做了一些统计分析,想将结果一起绘制在同一个图中,但对两个不同的子集使用不同的形状。

我是 ggplot 的新手,所以我的问题是必须在一开始就为 ggplot 提供完整的数据集。由于我已通过随机索引将数据分成两组,因此我找不到 aes() 的正确选择属性

data=read.csv("...",sep=" ")
data$class = as.factor(data$class - 1)
colnames(data)=c("y","x1","x2")
n = dim(data)[1]
order = sample(n)
test = data[order[1:(n/2)],]
train = data[order[(n/2):n,]
#...
ggplot(train) + geom_point( aes(x =x1, y = x2, color = y)) 
# this should be done for the whole dataset, kinda like this
# ggplot(data) + geom_point(aes(x=x1, y=x2, color=y, shape=(index is in test and not train)))
# which is obviously not valid

感谢您的宝贵时间, 尼克拉斯

【问题讨论】:

  • 我们可以看一些示例数据吗?这应该很简单,但希望查看一些数据以确保我们做对了。如果可以,请将其插入 R:dput(head(df,10))。这将为我们提供一个可重复使用的示例。您可能希望对测试和训练 dfs 都这样做。
  • 其实是thisstructure(list(y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), x1 = c(15.6, 11.2, 18.6, 16.8, 21, 15.2, 14.6, 17.6, 14, 16), x2 = c(5.64, 4.38, 5.68, 7.8, 4.32, 6.75, 5.25, 5.05, 5.2, 7.22)), .Names = c("y", "x1", "x2"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")的子集
  • 形状变量在哪里?形状变量只是训练集中的 1/0 吗?
  • 没有shape变量,应该是ggplot的参数。形状应通过属于测试集或训练集来确定

标签: r ggplot2


【解决方案1】:

如果您有 2 个数据集,如示例所示,这是另一种方法

library(ggplot2)
data = structure(list(y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), x1 = c(15.6, 11.2, 18.6, 16.8, 21, 15.2, 14.6, 17.6, 14, 16), x2 = c(5.64, 4.38, 5.68, 7.8, 4.32, 6.75, 5.25, 5.05, 5.2, 7.22)), .Names = c("y", "x1", "x2"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
data = data.frame(data)
n = dim(data)[1]
order = sample(n)
test = data[order[1:(n/2)],]
train = data[order[(n/2):n],]

ggplot(test)+geom_point(aes(y = x1,x=x2))+
  geom_point(data = train, aes(y = x1,x = x2), pch = 10, col = "dargreen")

【讨论】:

    【解决方案2】:

    如果我们假设数据集具有相同的列:

    # Loading data    
    data <- structure(list(y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), x1 = c(15.6, 11.2, 18.6, 16.8, 21, 15.2, 14.6, 17.6, 14, 16), x2 = c(5.64, 4.38, 5.68, 7.8, 4.32, 6.75, 5.25, 5.05, 5.2, 7.22)), .Names = c("y", "x1", "x2"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
    n = dim(data)[1]
    order = sample(n)
    test = data[order[1:(n/2)],]
    train = data[order[(n/2):n,]
    
    test$identifier <- as.factor(1) # mark the test with 1
    train$identifier <- as.factor(0) # mark the train with 0
    
    df_out <- rbind(test,train) # combine the dataframes
    
    ggplot(df_out,aes(x = x1,y = x2,color = y, shape = identifier)) + geom_point() # plot the new df
    

    这会产生:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-02-21
      • 1970-01-01
      • 2012-05-08
      • 2017-02-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多