【发布时间】:2018-05-05 01:05:06
【问题描述】:
我正在对混合数据进行聚类。为了测试我的算法,我需要使用生成的数据进行一些模拟。我知道使用 rnorm 生成数字属性,并使用字母样本进行分类? 但问题是使一列与另一列之间的关系(数字和分类属性)。 我不能只制作随机值和属性而没有任何关系。这种关系必须有意义。例如,如果我只是生成随机值,假设我有产品变量和价格。
product price
pen $500
说不通吧,关系会乱的。有什么建议吗?
我做了这段代码,但似乎不够好
n <- 500
prb <- 0.90
c1 = sample(2:5, 1)
c2 = sample(7:10, 1)
c3 = sample(12:15, 1)
x1 <- sample(c("A","B"), 1.5*n, replace = TRUE, prob = c(prb, 1-prb))
x1 <- c(x1, sample(c("A","B"), 1.5*n, replace = TRUE, prob = c(1-prb, prb)))
x1 <- as.factor(x1)
x2 <- sample(c("C","D","E"), n, replace = TRUE, prob = c(0.90, 0.05, 0.05))
x2 <- c(x2, sample(c("C","D","E"), n, replace = TRUE, prob = c(0.05, 0.9, 0.05)))
x2 <- c(x2, sample(c("C","D","E"), n, replace = TRUE, prob = c(0.05, 0.05, 0.9)))
x2 <- as.factor(x2)
x3 <- sample(c("X","Y"), 1.5*n, replace = TRUE, prob = c(0.6, 0.4))
x3 <- c(x3, sample(c("X","Y"), 1.5*n, replace = TRUE, prob = c(0.4, 0.6)))
x3 <- as.factor(x3)
x4 <- c(rnorm(n, mean = c1), rnorm(n, mean = c2), rnorm(n, mean = c3))
x5 <- c(rnorm(n, mean = c1+20), rnorm(n, mean = c2+30), rnorm(n, mean = c3+40))
x <- data.frame(x1,x2,x3,x4,x5)
【问题讨论】:
标签: r random generated-code