【发布时间】:2023-01-20 01:42:05
【问题描述】:
我有两个数据集 Data 和 Data1。我想合并这些保持所有差异,同时在新表中为所有公共行添加数值。有什么简单的工具吗?
head(Data)
contig position variantID refAllele altAllele refCount altCount totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
1 chr1 905373 . T C 2 4 6 0 0 6 0 0
2 chr1 911428 . C T 1 2 3 0 0 3 0 0
3 chr1 953279 . T C 146 126 272 0 0 273 1 0
4 chr1 962184 . T C 14 15 29 0 0 29 0 0
5 chr1 1024129 . T G 1 0 1 0 0 1 0 0
6 chr1 1039514 . C T 1 1 2 0 0 2 0 0
head(Data1)
contig position variantID refAllele altAllele refCount altCount totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
1 chr1 905373 . T C 2 3 5 0 0 5 0 0
2 chr1 933024 . C T 1 0 1 0 0 1 0 0
3 chr1 953279 . T C 122 124 246 0 0 248 2 0
4 chr1 962184 . T C 17 21 38 0 0 38 0 0
5 chr1 1022518 . G T 0 1 1 0 0 1 0 0
6 chr1 1024129 . T G 1 2 3 0 0 3 0 0
想要的输出示例
contig position variantID refAllele altAllele refCount altCount totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
1 chr1 905373 . T C 4 7 11 0 0 11 0 0
2 chr1 911428 . C T 1 2 3 0 0 3 0 0
2 chr1 933024 . C T 1 0 1 0 0 1 0 0
4 chr1 953279 . T C 268 150 518 0 0 521 3 0
正如我们在 column position site 905373 中看到的那样,common 是从 refCount 列添加在一起的。而站点 911428 和 933024 对于它们的数据集都是唯一的,但已插入到新数据集中。 他们创建输出表的方式是轻松的吗?
Data <- structure(list(contig = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr1"), position = c(905373L, 911428L, 953279L, 962184L, 1024129L,
1039514L), variantID = c(".", ".", ".", ".", ".", "."), refAllele = c("T",
"C", "T", "T", "T", "C"), altAllele = c("C", "T", "C", "C", "G",
"T"), refCount = c(2L, 1L, 146L, 14L, 1L, 1L), altCount = c(4L,
2L, 126L, 15L, 0L, 1L), totalCount = c(6L, 3L, 272L, 29L, 1L,
2L), lowMAPQDepth = c(0L, 0L, 0L, 0L, 0L, 0L), lowBaseQDepth = c(0L,
0L, 0L, 0L, 0L, 0L), rawDepth = c(6L, 3L, 273L, 29L, 1L, 2L),
otherBases = c(0L, 0L, 1L, 0L, 0L, 0L), improperPairs = c(0L,
0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 6L), class = "data.frame")
Data1 <- structure(list(contig = c("chr1", "chr1", "chr1", "chr1", "chr1",
"chr1"), position = c(905373L, 933024L, 953279L, 962184L, 1022518L,
1024129L), variantID = c(".", ".", ".", ".", ".", "."), refAllele = c("T",
"C", "T", "T", "G", "T"), altAllele = c("C", "T", "C", "C", "T",
"G"), refCount = c(2L, 1L, 122L, 17L, 0L, 1L), altCount = c(3L,
0L, 124L, 21L, 1L, 2L), totalCount = c(5L, 1L, 246L, 38L, 1L,
3L), lowMAPQDepth = c(0L, 0L, 0L, 0L, 0L, 0L), lowBaseQDepth = c(0L,
0L, 0L, 0L, 0L, 0L), rawDepth = c(5L, 1L, 248L, 38L, 1L, 3L),
otherBases = c(0L, 0L, 2L, 0L, 0L, 0L), improperPairs = c(0L,
0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 6L), class = "data.frame")
【问题讨论】:
-
rbind两个数据集,然后聚合位置和求和。我假设等位基因没有区别,是吗? PD 你能和@987654325分享数据吗@最好一起工作。 -
@RicVillalba 添加