【发布时间】:2016-10-11 11:48:12
【问题描述】:
我有一个包含多列的 data.frame。我有一列(序列)的唯一序列,我想与这个 data.frame 的下一个版本进行比较,并检查它们有多少肽,并检查这个数字是增加还是减少。
我从数据库中获取这个data.frame,但问题是这个数据库在每个版本中都会生成新的随机序列位置(参见2º版本)。
1ºRelease
ID | sequence | ... | Peptides | nºproject
1 | atggggg | ... | 65 | project
2 | tgatgat | ... | 3 | project
3 | actgat | ... | 32 | project
4 | atgtagtt | ... | 25 | project
5 | ttttaaat | ... | 32 | project
2ºrelease
ID | sequence | ... | Peptides | nºproject
1 | atggggg | ... | 66 | project
2 | tgatgat | ... | 5 | project
3 | actgat | ... | 36 | project
4 | ATTTGGGG | ... | 26 | project *** New one ***
5 | ATTGATGA | ... | 32 | project *** New one ***
6 | atgtagtt | ... | 47 | project
7 | ttttaaat | ... | 38 | project
如果在每个版本中将新序列放在列的末尾,我使用重复函数不会有任何问题,但不幸的是这是随机完成的。
这里有一个例子:
1º 发布:
df <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L),
.Label = c("1", "2", "3", "4" ,"5") ),
sequence = structure(c(1L,2L, 3L, 4L, 5L),
.Label = c(" actgat "," atagattg ", " atatagag ", " atggggg ", " atgtagtt "), class = "factor"),
peptides = structure(c(1L, 2L, 3L, 4L, 5L),
.Label = c(" 54 ", " 84 ", " 32 ", " 36 ", "12"),
class = "factor"), n_project = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = " project ", class = "factor")), .Names = c("ID", "sequence", "peptides", "n_project"), class = "data.frame", row.names = c(NA, -5L))
2º 发布:
df2 <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L),
.Label = c("1", "2", "3", "4" ,"5" ,"6", "7" ) ),
sequence = structure(c(1L,2L, 7L, 8L, 3L, 4L, 5L),
.Label = c(" actgat "," atagattg ", " atatagag ", " atggggg ", " atgtagtt ", " gggatgac ", " TATATCC ", " TTTTAAAT "), class = "factor"),
peptides = structure(c(1L, 2L,7L,8L, 3L, 4L, 5L),
.Label = c(" 56 ", " 85 ", " 31 ", " 36 ", "15", "10", "76", "98", "34", "76"),
class = "factor"), n_project = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = " project ", class = "factor")), .Names = c("ID", "sequence", "peptides", "n_project"), class = "data.frame", row.names = c(NA, -7L))
【问题讨论】:
标签: r