【发布时间】:2021-11-19 14:11:08
【问题描述】:
我正在尝试合并我拥有的两个数据集。
df1:
| day | month | year | lon | lat | month-year |
|---|---|---|---|---|---|
| 3 | 5 | 2009 | 5.7 | 53.9 | May 2009 |
| 8 | 9 | 2004 | 6.9 | 52.6 | Sep 2004 |
| 15 | 9 | 2004 | 3.8 | 50.4 | Sep 2004 |
| 5 | 5 | 2009 | 2.7 | 51.2 | May 2009 |
| 28 | 7 | 2005 | 14.8 | 62.4 | Jul 2005 |
| 18 | 9 | 2004 | 5.1 | 52.5 | Sep 2004 |
df2:
| nao-value | sign | month-year |
|---|---|---|
| - 2.1 | Negative | Sep 2004 |
| 1.3 | Positive | Jul 2005 |
| - 1.1 | Negative | May 2009 |
我想合并它以在发生数据中添加每个月和年的 NAO 值,这意味着我希望每个特定月份的 NAO 值在发生数据中针对该月的所有注册重复。
问题是我无法让 NAO 值与发生数据对齐,它要么只是重复放置,要么与它应该放置的日期不一致,如 month-year.x 和 month-year.y ,或者返回为 NA 值。
我尝试了几种不同的方法:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
但没有一个能达到我想要的结果。
编辑以包含dput:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
【问题讨论】:
-
df3 <- merge(df1, df2, by="month-year")应该可以工作,前提是两个数据集中的列名完全相同。可以提供dput(df1)和dput(df2)吗? -
dput(df1) 给出了一大堆值 + 这个:row.names = c(NA, -6223L), class= c("tbl_df", "tbl", "data.frame" )) dput(df2) 还给出了一大堆值+这个:row.names = c(NA, -380L), class= c("tbl_df", "tbl", "data.frame"))
-
是的,我们需要从
structure(..)开始的代码。由于您的数据集只有前 10 行,即dput(head(df1, 10))和dput(head(df2, 10)),因此占有很大份额。如果有很多列,则子集并仅选择相关的列。 -
现在在下面的答案中:)
-
df3