【发布时间】:2019-04-05 20:50:57
【问题描述】:
我尝试将两个数据集连接在一起。
df1 看起来像:
ID date_f
1 4281 2019-02-21
2 1108827 2004-03-15
3 6201 2012-02-27
4 310158 2010-03-01
5 711065 2016-02-25
6 314808 2003-03-11
7 45012 2004-05-12
8 745732 2014-11-21
9 1458891 2013-10-28
10 316206 2007-05-30
而df2 看起来像:
ID date year
1 6201 1999-12-31 1999
2 6201 2000-12-31 2000
3 6201 2001-12-31 2001
4 6201 2002-12-31 2002
5 6201 2003-12-31 2003
6 6201 2004-12-31 2004
7 6201 2017-12-31 2017
8 6201 2005-12-31 2005
9 6201 2006-12-31 2006
10 6201 2007-12-31 2007
11 6201 2008-12-31 2008
12 6201 2009-12-31 2009
13 6201 2010-12-31 2010
14 6201 2011-12-31 2011
15 6201 2012-12-31 2012
16 6201 2013-12-31 2013
17 6201 2014-12-31 2014
18 6201 2015-12-31 2015
19 6201 2016-12-31 2016
20 6201 2018-12-31 2018
我正在尝试加入他们(日期不匹配):
方法:
ID 加入,date < date_f 加入
预期输出(使用来自df1 的前 5 个观察值:
ID date_f date year
1 4281 2019-02-21 2018-12-31 2018
2 1108827 2004-03-15 2003-12-31 2003
3 6201 2012-02-27 2011-12-31 2011
4 310158 2010-03-01 2009-12-31 2009
5 711065 2016-02-25 2015-03-31 2014
例如,上面的第 2 行在2004-03-15 的df1 中有一个date_f,一种方法是将它与year() 合并,然后它会与2004-12-31 合并 - 但是这个日期在日期之后在df1。所以我试图将它与之前的日期合并,即2003-12-31。
最后第 5 行将与 2016-03-31 连接,但 date_f 小于此日期。 2016-02-25 in date_f 2016-03-31 in date.
数据
df1 <- structure(list(ID = c(4281L, 1108827L, 6201L, 310158L, 711065L,
314808L, 45012L, 745732L, 1458891L, 316206L), date_f = c("2019-02-21",
"2004-03-15", "2012-02-27", "2010-03-01", "2016-02-25", "2003-03-11",
"2004-05-12", "2014-11-21", "2013-10-28", "2007-05-30")), row.names = c(NA,
-10L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")
数据 2
df2 <- structure(list(ID = c(6201L, 6201L, 6201L, 6201L, 6201L, 6201L,
6201L, 6201L, 6201L, 6201L, 6201L, 6201L, 6201L, 6201L, 6201L,
6201L, 6201L, 6201L, 6201L, 6201L, 314808L, 314808L, 314808L,
314808L, 314808L, 314808L, 314808L, 314808L, 314808L, 314808L,
314808L, 314808L, 314808L, 314808L, 314808L, 314808L, 314808L,
314808L, 314808L, 314808L, 45012L, 45012L, 45012L, 45012L, 45012L,
45012L, 45012L, 45012L, 45012L, 45012L, 45012L, 45012L, 45012L,
45012L, 45012L, 45012L, 45012L, 45012L, 45012L, 45012L, 316206L,
316206L, 316206L, 316206L, 316206L, 316206L, 316206L, 316206L,
316206L, 316206L, 316206L, 316206L, 316206L, 316206L, 316206L,
316206L, 316206L, 310158L, 310158L, 310158L, 310158L, 310158L,
310158L, 310158L, 310158L, 310158L, 310158L, 310158L, 310158L,
310158L, 310158L, 310158L, 310158L, 310158L, 310158L, 310158L,
310158L, 745732L, 745732L, 745732L, 745732L, 745732L, 745732L,
745732L, 745732L, 745732L, 745732L, 745732L, 745732L, 745732L,
745732L, 745732L, 745732L, 745732L, 745732L, 745732L, 745732L,
745732L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L,
1458891L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L,
1458891L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L, 1458891L,
4281L, 4281L, 4281L, 4281L, 4281L, 4281L, 4281L, 711065L, 711065L,
711065L, 711065L, 711065L, 711065L, 711065L, 711065L, 711065L,
711065L, 711065L, 711065L, 711065L, 711065L, 711065L, 711065L,
711065L, 711065L, 1108827L, 1108827L, 1108827L, 1108827L, 1108827L,
1108827L, 1108827L, 1108827L, 1108827L, 1108827L, 1108827L, 1108827L,
1108827L, 1108827L, 1108827L, 1108827L, 1108827L, 1108827L),
date = c("1999-12-31", "2000-12-31", "2001-12-31", "2002-12-31",
"2003-12-31", "2004-12-31", "2017-12-31", "2005-12-31", "2006-12-31",
"2007-12-31", "2008-12-31", "2009-12-31", "2010-12-31", "2011-12-31",
"2012-12-31", "2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31",
"2018-12-31", "1999-12-31", "2000-12-31", "2001-12-31", "2002-12-31",
"2003-12-31", "2004-12-31", "2005-12-31", "2006-12-31", "2007-12-31",
"2008-12-31", "2009-12-31", "2010-12-31", "2011-12-31", "2012-12-31",
"2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31", "2017-12-31",
"2018-12-31", "1999-12-31", "2000-12-31", "2001-12-31", "2002-12-31",
"2003-12-31", "2004-12-31", "2005-12-31", "2006-12-31", "2007-12-31",
"2008-12-31", "2009-12-31", "2010-12-31", "2011-12-31", "2012-12-31",
"2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31", "2017-12-31",
"2018-12-31", "1999-12-31", "2000-12-31", "2001-12-31", "2002-12-31",
"2003-12-31", "2004-12-31", "2005-12-31", "2006-12-31", "2007-12-31",
"2008-12-31", "2009-12-31", "2010-12-31", "2011-12-31", "2012-12-31",
"2013-12-31", "2014-12-31", "2015-12-31", "1999-12-31", "2000-12-31",
"2001-12-31", "2002-12-31", "2003-12-31", "2004-12-31", "2005-12-31",
"2006-12-31", "2007-12-31", "2008-12-31", "2009-12-31", "2010-12-31",
"2011-12-31", "2012-12-31", "2013-12-31", "2014-12-31", "2015-12-31",
"2016-12-31", "2017-12-31", "2018-12-31", "1999-01-31", "2000-01-31",
"2001-01-31", "2002-01-31", "2003-01-31", "2004-01-31", "2005-01-31",
"2006-01-31", "2007-01-31", "2008-01-31", "2009-01-31", "2010-01-31",
"2011-01-31", "2012-01-31", "2013-01-31", "2014-01-31", "2015-01-31",
"2016-01-31", "2017-01-31", "2018-01-31", "2019-01-31", "1999-12-31",
"2000-12-31", "2001-12-31", "2002-12-31", "2003-12-31", "2004-12-31",
"2005-12-31", "2006-12-31", "2007-12-31", "2008-12-31", "2009-12-31",
"2010-12-31", "2011-12-31", "2012-12-31", "2013-12-31", "2014-12-31",
"2015-12-31", "2016-12-31", "2017-12-31", "2018-12-31", "2012-12-31",
"2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31", "2017-12-31",
"2018-12-31", "1999-03-31", "2000-03-31", "2001-03-31", "2002-03-31",
"2003-03-31", "2004-03-31", "2005-03-31", "2006-03-31", "2007-03-31",
"2008-03-31", "2009-03-31", "2010-03-31", "2011-03-31", "2012-03-31",
"2013-03-31", "2014-03-31", "2015-03-31", "2016-03-31", "2001-12-31",
"2002-12-31", "2003-12-31", "2004-12-31", "2005-12-31", "2006-12-31",
"2007-12-31", "2008-12-31", "2009-12-31", "2010-12-31", "2011-12-31",
"2012-12-31", "2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31",
"2017-12-31", "2018-12-31"), year = c(1999L, 2000L, 2001L,
2002L, 2003L, 2004L, 2017L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2018L, 1999L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L,
2018L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L,
2016L, 2017L, 2018L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L,
2015L, 2016L, 2017L, 2018L, 1998L, 1999L, 2000L, 2001L, 2002L,
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 1998L, 1999L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2013L, 2014L, 2015L, 2001L, 2002L,
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L,
2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L)), row.names = c(NA,
-181L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")
【问题讨论】:
-
你能显示预期的输出吗
-
我现在一定要写点什么!
-
我已经添加了一个预期的输出并更改了数据,因为它对我来说很糟糕。
-
你需要
library(data.table);setDT(df2)[df1, on = .(ID, date < date_f)] -
df1中的date_f列对应于某个报告日期,而df2中的date列对应于报告发布时不存在的一些财务数据,这就是为什么我想使用必须在报告日期之前的最后可用财务数据将数据连接在一起。
标签: r