【发布时间】:2019-06-25 08:57:44
【问题描述】:
问题:
如何通过使用 R tidyverse 和 dplyr 交叉检查 2x2 列来有效地连接两个表?我对 R 比较陌生,但我在之前的任何问题或讨论中都找不到这个问题。
我有两张表,它们的行数和列数不同。每个表都包含 A 列和 B 列。这些列包含可以相同或唯一的字符串,它们也可能重叠或从一列或另一列中丢失。基本上,我需要对照 A2 和 B2 检查 A1 列,然后对照 A2 和 B2 检查 B1。
概念解释示例:
df1
ID pISSN eISSN Level
437097 1530-9932 1
489309 2366-004X 2366-0058 1
437103 0025-5858 1
437109 1042-9670 1545-7230 1
449363 1093-1139 0
437127 0949-1775 1
437124 0361-3682 1873-6289 2
481203 0103-846X 0103-846X 1
479825 2153-2184 2153-2192 0
437136 0734-2071 1557-7333 2
df2
ID pISSN eISSN Format
41120 2364-9534 E OA S C
12249 1530-9932 E OF S
261 2366-0058 E OF S
12188 0025-5858 1865-8784 PE OF S
40596 1042-9670 1545-7230 PE OF S
12129 0895-4852 1936-4709 PE OF
769 0949-1775 1432-0517 PE OF S
result
ID pISSN eISSN Level Format
437097 1530-9932 1 E OF S
489309 2366-004X 2366-0058 1 E OF S
437103 0025-5858 1865-8784 1 PE OF S
437109 1042-9670 1545-7230 1 PE OF S
437127 0949-1775 1 PE OF S
输入示例表:
dput(df1, file = "")
structure(list(ID = c(437097, 489309, 437103, 437109, 449363, 437127, 437124, 481203, 479825, 437136), pISSN = c(NA, "2366-004X", "0025-5858", "1042-9670", "1093-1139", NA, "0361-3682", "0103-846X", "2153-2184", "0734-2071"), eISSN = c("1530-9932", "2366-0058", NA, "1545-7230", NA, "0949-1775", "1873-6289", "0103-846X", "2153-2192", "1557-7333"), Level = c(1, 1, 1, 1, 0, 1, 2, 1, 0, 2)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
dput(df2, file = "")
structure(list(ID = c(41120, 12249, 261, 12188, 40596, 12129, 769), pISSN = c(NA, NA, NA, "0025-5858", "1042-9670", "0895-4852", "0949-1775"), eISSN = c("2364-9534", "1530-9932", "2366-0058", "1865-8784", "1545-7230", "1936-4709", "1432-0517"), Format = c("E OA S C", "E OF S", "E OF S", "PE OF S", "PE OF S", "PE OF", "PE OF S")), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
【问题讨论】:
-
有趣。你能提供一个可重现的例子吗? stackoverflow.com/questions/5963269/…
-
您好,感谢您的回复。我对 R、编码和 Stack Overflow 还是很陌生。所以感谢信息链接,我尝试使用 dput 设置示例。我已将其添加到帖子中。希望这行得通。
标签: r join merge dplyr tidyverse