【问题标题】:Mutate something on complete cases, but keep all在完整的情况下改变某些东西,但保留所有
【发布时间】:2022-01-22 19:52:44
【问题描述】:

我想根据两个变量(国家和政党)的组合生成一个组 ID。这是我的数据:

df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
                 year = c(2010, 2010, 2010, 2010, 2010, 2010),
                 party = c(NA, NA, NA, "A", "B", "B")) 

这给出了:

  country year party
1      BE 2010  <NA>
2      BE 2010  <NA>
3      BE 2010  <NA>
4      NL 2010     A
5      NL 2010     B
6      NL 2010     B

我想要的是:

  country  year party group
  <chr>   <dbl> <chr> <int>
1 BE       2010 NA        NA
2 BE       2010 NA        NA
3 BE       2010 NA        NA
4 NL       2010 A         1
5 NL       2010 B         2
6 NL       2010 B         2

我试过了:

df <- df %>% 
  group_by(country, party) %>% 
  mutate(group = cur_group_id())

但这给了我:

  country  year party group
  <chr>   <dbl> <chr> <int>
1 BE       2010 NA        1
2 BE       2010 NA        1
3 BE       2010 NA        1
4 NL       2010 A         2
5 NL       2010 B         3
6 NL       2010 B         3

但是,我不希望为任何具有缺失值的数据单独分组。同时,我想保留数据。

如果我尝试:

df <- df %>% 
  group_by(country, party) %>% 
  filter(!is.na(party)) %>% 
  mutate(group = cur_group_id())

我明白了:

  country  year party group
  <chr>   <dbl> <chr> <int>
1 NL       2010 A         1
2 NL       2010 B         2
3 NL       2010 B         2

我怎样才能只为完整的数据获取这个新变量,同时将不完整的数据保留在数据集中?

谢谢

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    类似以下内容?

    library(tidyverse)
    
    df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
                     year = c(2010, 2010, 2010, 2010, 2010, 2010),
                     party = c(NA, NA, NA, "A", "B", "B")) 
    
    df %>% 
      group_by(country, party) %>% 
      mutate(group = if_else(is.na(party), NA_integer_, cur_group_id()))
    #> # A tibble: 6 × 4
    #> # Groups:   country, party [3]
    #>   country  year party group
    #>   <chr>   <dbl> <chr> <int>
    #> 1 BE       2010 <NA>     NA
    #> 2 BE       2010 <NA>     NA
    #> 3 BE       2010 <NA>     NA
    #> 4 NL       2010 A         2
    #> 5 NL       2010 B         3
    #> 6 NL       2010 B         3
    

    如果您希望组以 1(而不是 2)开头:

    library(tidyverse)
    
    df %>% 
      filter(!is.na(party)) %>% 
      group_by(country, party) %>% 
      mutate(group = cur_group_id()) %>% 
      ungroup %>% add_row(filter(df,is.na(party))) %>% 
      mutate(group = if_else(is.na(party), NA_integer_, group))
    
    #> # A tibble: 6 × 4
    #>   country  year party group
    #>   <chr>   <dbl> <chr> <int>
    #> 1 NL       2010 A         1
    #> 2 NL       2010 B         2
    #> 3 NL       2010 B         2
    #> 4 BE       2010 <NA>     NA
    #> 5 BE       2010 <NA>     NA
    #> 6 BE       2010 <NA>     NA
    

    【讨论】:

      【解决方案2】:

      使用交互

      df %>% mutate(group = as.integer(interaction(country, party, drop = TRUE)))
      

      给予:

        country year party group
      1      BE 2010  <NA>    NA
      2      BE 2010  <NA>    NA
      3      BE 2010  <NA>    NA
      4      NL 2010     A     1
      5      NL 2010     B     2
      6      NL 2010     B     2
      

      【讨论】:

        【解决方案3】:
        df <- data.frame(country = c("BE", "BE", "BE", "NL", "NL", "NL"),
                         year = c(2010, 2010, 2010, 2010, 2010, 2010),
                         party = c(NA, NA, NA, "A", "B", "B")) 
        
        library(data.table)
        setDT(df)[!is.na(party), grp := .GRP, by = party][]
        #>    country year party grp
        #> 1:      BE 2010  <NA>  NA
        #> 2:      BE 2010  <NA>  NA
        #> 3:      BE 2010  <NA>  NA
        #> 4:      NL 2010     A   1
        #> 5:      NL 2010     B   2
        #> 6:      NL 2010     B   2
        

        reprex package (v2.0.1) 于 2021 年 12 月 21 日创建

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2019-09-20
          • 1970-01-01
          • 1970-01-01
          • 2019-03-17
          • 2013-03-15
          • 1970-01-01
          相关资源
          最近更新 更多