【问题标题】:Issues with tidyversetidyverse 的问题
【发布时间】:2021-03-02 07:12:16
【问题描述】:

几个月前我运行了以下代码,它运行良好 -

ceo1_nochange <- ceo1 %>% 
  group_by(ISIN, year) %>% 
  nest(.key = "OTHER_DATA") %>% 
  group_by(ISIN) %>% 
  mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1), 
         OTHER_DATA_LEAD = lead(OTHER_DATA, 1), 
         KEEP = pmap(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
           isTRUE(all_equal(x["DirectorID"], y["DirectorID"])) ||
             isTRUE(all_equal(y["DirectorID"], z["DirectorID"]))
         })) %>% 
  filter(unlist(KEEP)) %>% 
  select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>% 
  unnest() %>% 
  ungroup()

我的目的是找出那些DirectorID 每年都没有变化的观察结果。

但现在我收到以下错误 -

Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
 Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.

谁能解释一下?

这是一个示例数据集 -

"ROW,ISIN,YEAR,DIRECTOR_NAME,DIRECTOR_ID
1,US9898171015,2006,Thomas (Tom) E Davin,2247441792
2,US9898171015,2006,Matthew (Matt) L Hyde,4842568996
3,US9898171015,2007,James (Jim) M Weber,3581636766
4,US9898171015,2007,Matthew (Matt) L Hyde,4842568996
5,US9898171015,2007,David (Dave) M DeMattei,759047198
6,US9898171015,2008,James (Jim) M Weber,3581636766
7,US9898171015,2008,Matthew (Matt) L Hyde,4842568996
8,US9898171015,2008,David (Dave) M DeMattei,759047198
9,US9898171015,2009,William (Bill) Milroy Barnum Jr,20462211719
10,US9898171015,2009,James (Jim) M Weber,3581636766
11,US9898171015,2009,Matthew (Matt) L Hyde,4842568996
12,US9898171015,2009,David (Dave) M DeMattei,759047198
13,US9898171015,2010,William (Bill) Milroy Barnum Jr,20462211719
14,US9898171015,2010,James (Jim) M Weber,3581636766
15,US9898171015,2010,Matthew (Matt) L Hyde,4842568996
16,US9898171015,2011,Sarah (Sally) Gaines McCoy,11434863691
17,US9898171015,2011,William (Bill) Milroy Barnum Jr,20462211719
18,US9898171015,2011,James (Jim) M Weber,3581636766
19,US9898171015,2011,Matthew (Matt) L Hyde,4842568996
20,US9898171015,2012,Sarah (Sally) Gaines McCoy,11434863691
21,US9898171015,2012,Ernest R Johnson,40425210975
22,US9898171015,2013,Sarah (Sally) Gaines McCoy,11434863691
23,US9898171015,2013,Ernest R Johnson,40425210975
24,US9898171015,2013,Travis D Smith,53006212569
25,US9898171015,2014,Sarah (Sally) Gaines McCoy,11434863691
26,US9898171015,2014,Ernest R Johnson,40425210975
27,US9898171015,2014,Travis D Smith,53006212569
28,US9898171015,2015,Kalen F Holmes,11051172801
29,US9898171015,2015,Sarah (Sally) Gaines McCoy,11434863691
30,US9898171015,2015,Ernest R Johnson,40425210975
31,US9898171015,2015,Travis D Smith,53006212569
32,US9898171015,2016,Sarah (Sally) Gaines McCoy,11434863691
33,US9898171015,2016,Ernest R Johnson,40425210975
34,US9898171015,2016,Travis D Smith,53006212569
35,US9898171015,2017,Sarah (Sally) Gaines McCoy,11434863691
36,US9898171015,2017,Scott Andrew Bailey,174000000000
37,US9898171015,2017,Ernest R Johnson,40425210975
38,US9898171015,2017,Travis D Smith,53006212569
" 

有人可以提供一些线索吗?

【问题讨论】:

  • 你使用的是同一个版本的R
  • @Treizh 我想我没有使用相同的版本。你知道当这个编码工作时我怎么知道我使用的是哪个版本吗?

标签: r tidyverse


【解决方案1】:

我在代码中没有发现任何可能因最近的更改而受到影响的内容。您收到错误的原因是因为 laglead 函数。当您在数据框上使用它们时,它会分别在开头和结尾创建 NULL 值。如果您将该检查放入pmap 语句中,它应该可以工作。

我还对代码进行了一些其他更改 -

  • .key 已在 nest 中弃用,因此改用 nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)
  • 使用pmap_lgl(而不是pmap),这样您就不必在filter 中执行unlist(KEEP)
  • unnest 需要明确提及列名才能取消嵌套,因此使用 unnest(cols = c(OTHER_DATA))
library(tidyverse)

ceo1 %>% 
  group_by(ISIN, YEAR) %>% 
  nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)) %>% 
  group_by(ISIN) %>% 
  mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1), 
         OTHER_DATA_LEAD = lead(OTHER_DATA, 1),
         KEEP = pmap_lgl(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
           if(length(x) > 0 && length(y) > 0 && length(z) > 0)
                isTRUE(all_equal(x["DIRECTOR_ID"], y["DIRECTOR_ID"])) ||
                isTRUE(all_equal(y["DIRECTOR_ID"], z["DIRECTOR_ID"]))
           else FALSE
         })) %>% 
  filter(KEEP) %>% 
  select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>% 
  unnest(cols = c(OTHER_DATA)) %>% 
  ungroup()

#   ISIN          YEAR   ROW DIRECTOR_NAME              DIRECTOR_ID
#   <chr>        <int> <int> <chr>                            <dbl>
# 1 US9898171015  2007     3 James (Jim) M Weber         3581636766
# 2 US9898171015  2007     4 Matthew (Matt) L Hyde       4842568996
# 3 US9898171015  2007     5 David (Dave) M DeMattei      759047198
# 4 US9898171015  2008     6 James (Jim) M Weber         3581636766
# 5 US9898171015  2008     7 Matthew (Matt) L Hyde       4842568996
# 6 US9898171015  2008     8 David (Dave) M DeMattei      759047198
# 7 US9898171015  2013    22 Sarah (Sally) Gaines McCoy 11434863691
# 8 US9898171015  2013    23 Ernest R Johnson           40425210975
# 9 US9898171015  2013    24 Travis D Smith             53006212569
#10 US9898171015  2014    25 Sarah (Sally) Gaines McCoy 11434863691
#11 US9898171015  2014    26 Ernest R Johnson           40425210975
#12 US9898171015  2014    27 Travis D Smith             53006212569

【讨论】:

  • 谢谢 如果我只想保留那些年复一年保持不变的 DirectorID,我应该在哪里更改代码?例如 - 如果我们查看您的最终代码的输出,我们可以看到 2007 年和 2008 年被保留,​​因为 ISIN US9898171015 在那一年没有更改 DirectorID,但是如果我们查看我的示例数据,我们可以看到 DirectorID - 3581636766 (James (Jim) M Weber), 4842568996 (Matthew (Matt) L Hyde, 759047198 (David (Dave) M DeMattei) 都在 2008 年和 2009 年。2009 年因为新的 DirectorID 在 2009 年被删除了。我可以保留每年保持相同的所有 DirectorID。
  • 我看到您为此提出了一个新问题并收到了答案。
  • 当我运行我接受作为答案的代码时,我的观察结果与我的旧代码略有不同。这个小小的差异改变了我的结果。在我看来,如果我可以运行我的旧代码,那就太好了。您知道如何运行旧代码吗?
  • 正如我在答案中提到的,我在您的代码中找不到任何可能受到库中最近更改的影响的任何内容。但是,我也可能错了,因为 tidyverse 最近发生了很大变化。
  • 知道了。感谢您的所有帮助。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-08-08
  • 1970-01-01
  • 2021-12-09
  • 1970-01-01
相关资源
最近更新 更多