【发布时间】:2019-12-07 22:48:00
【问题描述】:
我有一个标记坐标的数据表,在两组(A 和 B)之间对齐。例如:
dt_long <- data.table(LABEL_A = c(rep("A", 20), rep("A", 15), rep ("A", 10), rep ("A", 15), rep ("A", 10)),
SEQ_A = c(11:30, 61:75, 76:85, 86:100, 110:119),
LABEL_B= c(rep("C", 20), rep("D", 15), rep("F", 10), rep("G",15), rep("D", 10)),
SEQ_B = c(1:20, 25:11, 16:25, 15:1, 1:5, 8:12))
如何将这些信息简化为简短格式,其中给出了每个对齐序列的开始和结束坐标。例如:
dt_short <- data.table(LABEL_A = c("A", "A", "A", "A", "A", "A"),
Start_A = c(11, 61, 76, 86, 110, 115),
End_A = c(30, 75, 85, 100, 114, 119),
LABEL_B= c("C", "D", "F", "G", "D", "D"),
Start_B = c(1, 25, 16, 15, 1, 8),
End_B = c(20, 11, 25, 1, 5, 12))
每个对齐序列的长度应该相同。例如:
identical(abs(dt_short$End_A - dt_short$Start_A), abs(dt_short$End_B - dt_short$Start_B))
【问题讨论】:
-
LABEL列和长度有关系吗 -
不,标签列区分不同组中的序列。
标签: r data.table melt