通过将一个表中的列名与 R 中另一个表中的列值匹配来将值添加到行答案

【问题标题】：Add value to a Row by matching column name in one table to column value in another in R通过将一个表中的列名与 R 中另一个表中的列值匹配来将值添加到行
【发布时间】：2021-12-28 15:11:50
【问题描述】：

Df1：

variant ID1 ID2 ID3 ID4 .... ID80000
123     0    1   2   1         0
321     1    2   1   1         1
543     1    1   2   1         1
6542    1    0   0   1         0  
243     1    0   2   1         1
654     0    1   1   2         1 
342     1    2   1   2         1
present 0    1   0   1         0

Df2：

ID  sex    yob         disease
ID1  M    10/10/1910    cancer
ID2  F     05/02/2000   CML
ID3  F     01/01/1983   gout

我想将 DF2 中的列作为行添加到 DF1 中，通过匹配 ID 将列名放入 DF1 的变体列中

期望的结果

variant ID1            ID2       ID3     ID4 .... ID80000
123     0               1         2       1         0
321     1               2         1       1         1
543     1               1         2       1         1
6542    1               0         0       1         0  
243     1               0         2       1         1
654     0               1         1       2         1 
342     1               2         1       2         1
present 0               1         0       1         0
sex     M               F         F       NA        NA
yob     10/10/1910  05/02/2000 01/01/1983 NA        NA
disease cancer         CML       gout     NA        NA

我试过了：

df1["sex",] <- df2$sex[match(df2$ID, colnames(df1),]

这不起作用。

我已经得到了这个工作：

df1["sex",] <- ifelse(colnames(df1) %in% df2$ID, df2$sex, NA)

我什至不知道如何一次处理多个列。

任何帮助将不胜感激

【问题讨论】：

你能做到dput(df1[1:7, 1:5]) 和dput(df2[1:3, 1:4]) 并分享输出以使你的示例可重现吗？
恐怕我在气密的 HPC 环境中工作，因此无法导出任何数据，因此无法导出表。很高兴编辑表格以使其更具可读性

标签： r dataframe merge data.table

【解决方案1】：

使用data.table：

虽然这适用于本示例，但您不能将其按原样用于“任何”其他数据集。它需要一些数据知识，可以在遵循准备步骤时轻松调整（见解释）。

library(data.table)

rbindlist(list(df1, cbind( variant=names(df2)[2:ncol(df2)],
  setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))), fill=T)

    variant        ID1        ID2        ID3 ID4
 1:     123          0          1          2   1
 2:     321          1          2          1   1
 3:     543          1          1          2   1
 4:    6542          1          0          0   1
 5:     243          1          0          2   1
 6:     654          0          1          1   2
 7:     342          1          2          1   2
 8: present          0          1          0   1
 9:     sex          M          F          F  NA
10:     yob 10/10/1910 05/02/2000 01/01/1983  NA
11: disease     cancer        CML       gout  NA

说明

df1 很好，但 df2 需要注意，因为我们没有 variant 列。

# first part of df2, all "ID" columns [2->end]
setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] )
#               ID1        ID2        ID3
#sex              M          F          F
#yob     10/10/1910 05/02/2000 01/01/1983
#disease     cancer        CML       gout

# second part of df2, prepare first column
names(df2)[2:ncol(df2)]
#[1] "sex"     "yob"     "disease"

# put together with name variant
cbind( variant=names(df2)[2:ncol(df2)], 
  setnames( data.frame( t(df2[,2:ncol(df2)]) ), df2[,1] ))
#        variant        ID1        ID2        ID3
#sex         sex          M          F          F
#yob         yob 10/10/1910 05/02/2000 01/01/1983
#disease disease     cancer        CML       gout

# now df2 is ready to be matched with df1s column names using rbindlist like above

数据

df1 <- structure(list(variant = c("123", "321", "543", "6542", "243", 
"654", "342", "present"), ID1 = c(0L, 1L, 1L, 1L, 1L, 0L, 1L, 
0L), ID2 = c(1L, 2L, 1L, 0L, 0L, 1L, 2L, 1L), ID3 = c(2L, 1L, 
2L, 0L, 2L, 1L, 1L, 0L), ID4 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L)), class = "data.frame", row.names = c(NA, -8L))

df2 <- structure(list(ID = c("ID1", "ID2", "ID3"), sex = c("M", "F", 
"F"), yob = c("10/10/1910", "05/02/2000", "01/01/1983"), disease = c("cancer", 
"CML", "gout")), class = "data.frame", row.names = c(NA, -3L))

【讨论】：

嗨@Andre Wildberg。我想出了与您几乎相同的解决方案。所以，我不会发布答案。也就是说，只是一个简短的说明：我建议您使用setnames() 函数而不是setNames() 来使用data.table 库的专用函数。干杯。
@lovalery 这是一个很好的建议，谢谢！我把它放进去！
抱歉，我再次运行此程序，在第一行中出现“setnames 错误（data.frame（t(df2[,2:ncol(df2)])），df2[,1] ) : 传递了一个“list”类型的向量。需要是“charachter”类型。
@tacrolimus 我从你的例子中复制了 df2 。我假设您的 df2$ID 是一个列表。尝试在命令中使用unlist(df2[,1]) 而不是普通的df2[,1]。
@tacrolimus ... 或使用带有大写 N 的 setNames。它不那么挑剔，并且两者都可以正常工作。

【解决方案2】：

另一种方式，使用 dplyr 调整 df2，使用 magrittr 进行管道运算符，使用 data.table 连接两个 df

library(dplyr)
library(magrittr)

df2 <- as_tibble(t(df2[, -1])) %>% 
  `colnames<-` (df2[["ID"]]) %>% 
  mutate(variant = rownames(t(df2[, -1]))) %>% 
  relocate(variant)

library(data.table)
rbindlist(list(df1, df2), fill = TRUE)

【讨论】：