【发布时间】:2021-10-03 16:25:27
【问题描述】:
我正在尝试弄清楚如何在 dplyr 中使用 Joins。当我使用 full_join 加入 a 和 b 时,我得到了 4 个 statefips 缺失值的状态。
- 有没有更好的加入方法,可以完全避免这个问题,而且不会丢失任何数据?
- 加入 a 和 b 后可以添加 statefips(真正的 a 和 b 包含 4000+ 行)吗?
library(tidyverse)
# create df's
a <- tibble::tribble(
~statename, ~statefips, ~date, ~emp,
"Alabama", 1, "2020-01-14", 2,
"California", 6, "2020-01-14", 2,
"Alabama", 1, "2020-01-15", 2,
"California", 6, "2020-01-15", 2,
"Alabama", 1, "2020-01-16", 3,
"California", 6, "2020-01-16", 3,
"Alabama", 1, "2020-01-17", 3,
"California", 6, "2020-01-17", 3,
"Alabama", 1, "2020-01-18", 4,
"California", 6, "2020-01-18", 4,
"Alabama", 1, "2020-01-19", 4,
"California", 6, "2020-01-19", 4,
"Alabama", 1, "2020-01-20", 4,
"California", 6, "2020-01-20", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 5,
"Alabama", 1, "2020-01-22", 5,
"California", 6, "2020-01-22", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 4,
"Alabama", 1, "2020-01-22", 4,
"California", 6, "2020-01-22", 4,
"Alabama", 1, "2020-01-23", 4,
"California", 6, "2020-01-23", 4,
"Alabama", 1, "2020-01-24", 4,
"California", 6, "2020-01-24", 4
)
b <- tibble::tribble(
~statename, ~date, ~ui_claims,
"Alabama", "2020-01-04", "0.5",
"California", "2020-01-04", "0.5",
"Alabama", "2020-01-11", "0.5",
"California", "2020-01-11", "2.5",
"Alabama", "2020-01-18", "2.5",
"California", "2020-01-18", "1.5"
)
# Join a and b
full_join <- full_join(a, b, by = c("statename", "date")) %>% arrange(date)
# my try to fix missing NA's (doesn't work)
state_id <- tibble::tribble(
~statename, ~statefips,
"Alabama", 1,
"California", 6
)
full_join_fix <- full_join(full_join, state_id, by = "statename") %>% arrange(date)
【问题讨论】:
-
如果缺少某些内容,它将不会出现在连接表中。也许你想做 left_join 或 inner_join。
-
full_join 不会按照我的理解删除任何数据?它只会在需要的地方添加 NA..
-
是的。左乔恩保留左表中的所有行,内连接仅保留两个表中的行