使用另一个数据框的匹配值向数据框添加新列[重复]答案

【问题标题】：Add a new column to a dataframe using matching values of another dataframe [duplicate]使用另一个数据框的匹配值向数据框添加新列[重复]
【发布时间】：2016-08-30 06:37:22
【问题描述】：

我正在尝试用 table2 的匹配 val2 值填充 table1

table1$New_val2 = table2[table2$pid==table1$pid,]$val2

但我收到警告

longer object length is not a multiple of shorter object length

这很公平，因为表格长度不一样。

请指导我正确的方法。

【问题讨论】：

merge(table1, table2, by="pid") 可以根据需要添加all.x=TRUE 参数。
嗨，如果table2中有其他列但我只想添加col2怎么办？
merge(table1, table2[, c("pid", "col2")], by="pid")

标签： r dataframe match

【解决方案1】：

我不确定你是不是这个意思，但你可能会使用：

newtable <- merge(table1,table2, by  = "pid")

这将创建一个名为 newtable 的新表，其中包含 3 列和与 id 匹配的值，在本例中为“pid”。

【讨论】：

【解决方案2】：

merge(table1, table2[, c("pid", "val2")], by="pid")

添加 all.x=TRUE 参数以保留 table1 中所有在 table2 中没有匹配项的 pid...

你走在正确的轨道上。这是一种使用匹配的方法...

table1$val2 <- table2$val2[match(table1$pid, table2$pid)]

【讨论】：

【解决方案3】：

我来晚了，但万一其他人问同样的问题：
这正是 dplyr 的 inner_merge 所做的。

table1.df <- dplyr::inner_join(table1, table2, by=pid)

by-command 指定应使用哪一列来匹配行。

编辑：我曾经很难记住它是 [join]，而不是 [merge]。

【讨论】：

我更喜欢这个而不是merge()，因为在这个过程中表格没有被打乱，尽管这个函数现在被称为dplyr::inner_join()
pid 现在也需要在 "" 中 - 即 table1.df