【发布时间】:2016-03-25 10:34:20
【问题描述】:
我正在尝试在 Scala/Spark 中进行带替换采样,定义每个类的概率。
这就是我在 R 中的做法。
# Vector to sample from
x <- c("User1","User2","User3","User4","User5")
# Occurenciens from which to obtain sampling probabilities
y <- c(2,4,4,3,2)
# Calculate sampling probabilities
p <- y / sum(y)
# Draw sample with replacement of size 10
s <- sample(x, 10, replace = TRUE, prom = p)
# Which yields (for example):
[1] "User5" "User1" "User1" "User5" "User2" "User4" "User4" "User2" "User1" "User3"
如何在 Scala / Spark 中做同样的事情?
【问题讨论】:
-
请在询问之前查看文档。
-
我一直在广泛搜索文档,但找不到答案。如果您能指出这一点,或者,如果您能指出如何更有效地使用文档的指导,我将不胜感激。
标签: r scala apache-spark