【发布时间】:2017-04-22 16:29:36
【问题描述】:
我有两个由不同采样器采集的龙虾蛋大小数据集,将用于评估测量变异性。每个采样器从众多龙虾中测量约 50 个鸡蛋\龙虾。然而,有时一些龙虾是由采样器一而不是采样器二处理的,反之亦然。我想将来自两个采样器的数据组合为一个新数据集,但从仅由一个采样器处理的龙虾中删除所有数据。我已经用 semi_join 和 intersect 玩过 dplyr,但我需要在数据集 1 -> 2 和 2
这是我的数据的简化版本,其中对多只龙虾进行了多个鸡蛋面积测量,但采样并不总是重叠(即,仅由一个采样器而不是另一个采样器从个体测量鸡蛋):
install.packages(dplyr)
library(dplyr)
sampler1 <- data.frame(LobsterID=c("Lobster1","Lobster1","Lobster2",
"Lobster2","Lobster2","Lobster2",
"Lobster2","Lobster3","Lobster3","Lobster3"),
Area=c(.4,.35,1.1,1.04,1.14,1.1,1.05,1.7,1.63,1.8),
Sampler=c(rep("Sampler1", 10)))
sampler2 <- data.frame(LobsterID=c("Lobster1","Lobster1","Lobster1",
"Lobster1","Lobster1","Lobster2",
"Lobster2","Lobster2","Lobster4","Lobster4"),
Area=c(.41,.44,.47,.43,.38,1.14,1.11,1.09,1.41,1.4),
Sampler=c(rep("Sampler2", 10)))
combined <- bind_rows(sampler1, sampler2)
desiredresult <- combined[-c(8, 9, 10, 19, 20), ]
脚本的最后一行是模拟数据的预期结果。我希望限制使用 base R 或 dplyr。
【问题讨论】: