【发布时间】:2021-06-21 22:02:51
【问题描述】:
我正在处理一个数据集 (30000 x 500 ),我需要根据另一列的数据替换列中的一些值。问题是在每一行中,参考值都会发生变化。这是数据集的一个子示例:
#Create a data frame
df <- data.frame(SNP = c("SNP1","SNP2","SNP3","SNP4","SNP5","SNP6","SNP7","SNP8","SNP9","SNP10"),
A_allele = c("C","G","C","G","C","C","A","T","G","C"),
B_allele = c("G","A","T","A","A","G","T","A","C","A"),
alleles = c("C/G","G/A","C/T","G/A","C/A","C/G","A/T","T/A","G/C","C/A"),
line_1 = sample(c("A","B"),10, replace = TRUE),
line_2 = sample(c("A","B"),10, replace = TRUE),
line_3 = sample(c("A","B"),10, replace = TRUE),
line_4 = sample(c("A","B"),10, replace = TRUE),
line_5 = sample(c("A","B"),10, replace = TRUE),
line_6 = sample(c("A","B"),10, replace = TRUE),
line_7 = sample(c("A","B"),10, replace = TRUE),
line_8 = sample(c("A","B"),10, replace = TRUE),
line_9 = sample(c("A","B"),10, replace = TRUE),
line_10 = sample(c("A","B"),10, replace = TRUE)
)
df
head(df)
SNP A_allele B_allele alleles line_1 line_2 line_3 line_4 line_5 line_6 line_7 line_8 line_9 line_10
1 SNP1 C G C/G B A B A B B B B B A
2 SNP2 G A G/A A B A A A B B A B A
3 SNP3 C T C/T B B A B B B A A A A
4 SNP4 G A G/A A B B A B A B B B A
5 SNP5 C A C/A B A B B B A B A B B
6 SNP6 C G C/G B A B A B A B B B B
7 SNP7 A T A/T B A A B A A B A B A
8 SNP8 T A T/A A B A B A A B B A B
9 SNP9 G C G/C B A B B B B A B A B
10 SNP10 C A C/A B B B B B A A A A A
对于每一行,A_allele 和 B_allele 列作为参考值来更改 10 行中的 A 或 B 值。当存在“A”值时 => 使用列 A_allele 中的值,当存在“B”值时 => 使用列_B 中的值。
在示例中,应如下所示:
- 第 1 行:将 A 行更改为 C / 将 B 行更改为 G
- 第 2 行:将 A 行更改为 G / 将 B 行更改为 A
- 第 3 行:将 A 行更改为 C / 将 B 行更改为 T
- 第 10 行:同样的想法。
输出应该是这样的:
SNP A_allele B_allele alleles line_1 line_2 line_3 line_4 line_5 line_6 line_7 line_8 line_9 line_10
1 SNP1 C G C/G G C G C G G G G G C
2 SNP2 G A G/A G A G G G A A G A G
3 SNP3 C T C/T T T C T T T C C C C
4 SNP4 G A G/A G A A G A G A A A G
5 SNP5 C A C/A A C A A A C A C A A
6 SNP6 C G C/G G C G C G C G G G G
7 SNP7 A T A/T T A A T A A T A T A
8 SNP8 T A T/A T A T A T T A A T A
9 SNP9 G C G/C C G C C C C G C G C
10 SNP10 C A C/A A A A A A C C C C C
由于大约有 30000 行,如果可能的话,我想要一个高效的代码来运行。
有什么建议吗?
【问题讨论】:
标签: r dataframe if-statement conditional-statements