【发布时间】:2019-09-18 04:02:05
【问题描述】:
这就是我所拥有的:
df <- structure(list(Sample = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L), .Label = c("19-0001", "19-0002", "19-0003", "19-0004"), class = "factor"),
Replicate = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), X24854000 = structure(c(1L,
2L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("", "CC"), class = "factor"),
X24854056 = structure(c(3L, 3L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"AA", "GG"), class = "factor"), X24854764 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "TA", class = "factor"),
X24854903 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("",
"CT"), class = "factor"), X24855066 = structure(c(1L, 1L,
3L, 3L, 2L, 2L, 2L, 2L), .Label = c("", "CA", "CC"), class = "factor"),
X24855114 = structure(c(2L, 1L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c("",
"GA", "GG"), class = "factor"), X24855316 = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("", "TC"), class = "factor"),
X24855449 = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("CC",
"GG"), class = "factor"), X24855925 = structure(c(2L, 1L,
1L, 3L, 2L, 2L, 1L, 1L), .Label = c("", "GA", "GG"), class = "factor"),
X24856070 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CC",
"CT"), class = "factor"), X24856086 = structure(c(2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC", "CT"), class = "factor"),
X24856329 = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"AG"), class = "factor"), X24856389 = structure(c(2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("", "GG"), class = "factor"),
X24857235 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("",
"CT"), class = "factor"), X24857350 = structure(c(3L, 3L,
1L, 1L, 2L, 2L, 1L, 1L), .Label = c("", "GA", "GG"), class = "factor"),
X24857404 = structure(c(1L, 3L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("",
"AT", "TT"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
这会生成这个表
Sample Replicate X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316 X24855449 X24855925 X24856070 X24856086 X24856329 X24856389 X24857235 X24857350 X24857404
19-0001 1 GG TA GA TC CC GA CT CT AG GG GG
19-0001 2 CC GG TA TC GG CC CC GG TT
19-0002 1 CC AA TA CC GG GG CC CT AG
19-0002 2 TA CC GG GG GG CC CT AG
19-0003 1 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0003 2 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0004 1 TA CA GA TC CC CC CT AG GG CT
19-0004 2 TA CA GA CC CC CT AG GG
这就是我想要的:
Sample Replicate X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316 X24855449 X24855925 X24856070 X24856086 X24856329 X24856389 X24857235 X24857350 X24857404
19-0001 1 CC GG TA GA TC 99 GA 99 99 AG GG GG TT
19-0002 1 CC AA TA CC GG GG GG CC CT AG
19-0003 1 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0004 1 TA CA GA TC CC CC CT AG GG CT
将重复 1 和 2 合并到相同的样本名称下。缺失或相同的分数可以用另一个替换,但任何不匹配的都应替换为“99”,以便以后将其删除。
我试过了:
data_merge <- data %>%
group_by(Sample) %>%
summarise_all(ifelse(statement), (if_true), (if_false))
我只对数据进行子集化,真实数据有 44 个 X 数。
【问题讨论】:
-
请以可重现的格式提供样本数据,例如使用
dput。 -
我对 dput 不熟悉,我尝试了 dput(out, file = "test.txt", control = c("keepNA", "keepInteger")) 但输出文件看起来不与输入一相同。
-
dput的使用在一篇关于如何提供minimal reproducible example 的帖子中进行了解释。简而言之,执行dput(df)(其中df是您的data.frame),然后在您的主帖中包含(即复制和粘贴)dput的输出(而不是作为评论)。 -
谢谢。与包本身的说明相比,该链接实际上非常有用。下次遇到 R 问题时,我会使用它。
-
很高兴@RSun 有帮助。请考虑通过在答案旁边设置绿色复选标记来关闭问题。这样,您可以帮助保持 SO 整洁,并使未来的 SO 读者更容易识别相关问题。谢谢。