数据集的剩余变量[关闭]答案

【问题标题】：remaining variables of the data set [closed]数据集的剩余变量[关闭]
【发布时间】：2012-10-31 10:06:17
【问题描述】：

我有一个包含 150 个数字的数据集，我从中抽取了 100 个。如何识别（放入新矩阵）剩余的 50 个？

X <- runif(150)
Combined <- sample(X, 100)

【问题讨论】：

如果您发布代码会很有帮助。

标签： r random matrix

【解决方案1】：

将您的样本创建为单独的向量：

using <- sample(1:150, 100)

Entires <- All.Entries[using]
Non.Entries <- All.Entries[-using]

【讨论】：

【解决方案2】：

所有数字：

x <- sample(10, 150, TRUE) # as an example

随机样本：

Combined <- sample(x,100)

剩下的数字：

xs <- sort(x) # sort the values of x
tab <- table(match(Combined, xs))
Remaining <- xs[-unlist(mapply(function(x, y) seq(y, length = x),
                               tab, as.numeric(names(tab))))]

注意。如果 x 具有重复值，此解决方案也有效。

【讨论】：

这里的y变量是什么？非常感谢
@user1816762 变量y是mapply函数的第三个参数，即as.numeric(names(tab))。这个数字向量表示xs 中找到Combined 值的位置。

【解决方案3】：

根据您的评论进行更新。

如果Combined 是X 的子集，要查找X 中不在Combined 中的元素，您可以使用：

    X[ !(X %in% Combined) ]

X %in% Combined) 会给你一个与X 相同大小的逻辑向量，当元素在Combined 和FALSE 时，它的值是TRUE。

作为课程解释：这个逻辑向量可以用作索引。 X[ X %in% Combined ] 将给你所有X 这样X 在Combined 中。

由于您正在寻找相反的逻辑向量X[ !(X %in% Combined) ] 以获取所有X，因此X 不在Combined 中。

如果 X 包含重复项，那么您可以根据名称进行过滤（当然假设名称是唯一的）

X[ !(names(X) %in% names(Combined)) ] 

# or if sampling by rows
X[ !(rownames(X) %in% rownames(Combined)) ]

您可以轻松地将名称分配给X

names(X) <- 1:length(X)

# or for multi-dimensional
rownames(X)  <- 1:nrow(X)

另请参阅帮助文档

?"%in%"  # note the quotes
?which
?match

或者，您可以对索引进行采样，使用负号如下mat[-indices,] 例子：

    # Create a sample matrix of 150 rows, 3 columns
    mat <- matrix(rnorm(450), ncol=3)

    # Take a sampling of indices to the rows
    indices <- sample(nrow(mat), 100, replace=F)

    # Splice the matrix
    mat.included <- mat[indices,]
    mat.leftover <- mat[-indices,]

    # Confirm everything is of proper size
    dim(mat)
    # [1] 150   3
    dim(mat.included)
    # [1] 100   3
    dim(mat.leftover)
    # [1] 50  3

【讨论】：

我有一个包含 150 个数字的数据集，所以： Combined=sample(x,100) #现在从数据集 x 中抽取 100 个随机数，我如何从 x 中获取剩余的 50 个？（我没有太多要粘贴的代码，因为我一直在使用“重复”函数，但它会从数据集 x 中删除所有重复的数字