【问题标题】:How to create new variable with source of coalesced value in R [duplicate]如何在R中创建具有合并值源的新变量[重复]
【发布时间】:2020-08-26 09:59:58
【问题描述】:

我有来自 3 个不同来源的医疗登记数据,对于我的许多变量,我有来自每个登记的多个条目。每行仅包含来自 1 个注册表(来源)的数据。我已经能够将这三个合并在一起以制作单个“新”变量,但我还想创建一个变量来说明合并变量的来源。 我不熟悉以这种方式使用 R(通常我会匆匆回到 excel 来操作变量),我花了一些时间寻找类似的例子,但找不到答案。任何帮助将不胜感激。 (第一次发帖,所以对我提出问题的建议也很有帮助)。

    library(tidyverse)

    df <- tibble(var1 = c(1,2,NA,NA,NA), var2 = c(NA,NA,3,4,NA),var3 = c(NA,NA,NA,NA,5))
    df
    #># A tibble: 5 x 3
    #>    var1  var2  var3
    #>   <dbl> <dbl> <dbl>
    #>1     1    NA    NA
    #>2     2    NA    NA
    #>3    NA     3    NA
    #>4    NA     4    NA
    #>5    NA    NA     5

    #CoalesCe x, y and z to 'new' variable

    >df$new <- coalesce(df$var1,df$var2,df$var3)

    >df
    #># A tibble: 5 x 4
    #>     var1  var2  var3   new
    #>    <dbl> <dbl> <dbl> <dbl>
    #> 1     1    NA    NA     1
    #> 2     2    NA    NA     2
    #> 3    NA     3    NA     3
    #> 4    NA     4    NA     4
    #> 5    NA    NA     5     5

    #I would also like a variable that gives the 'source' of the coalesced variable, that         
    would look like below, but I cannot figure out how to do this 
    >df_final
    #># A tibble: 5 x 5
    #>   var1  var2  var3   new source
    #>   <dbl> <dbl> <dbl> <dbl> <chr> 
    #>1     1    NA    NA     1 var1  
    #>2     2    NA    NA     2 var1  
    #>3    NA     3    NA     3 var2  
    #>4    NA     4    NA     4 var2  
    #>5    NA    NA     5     5 var3 

【问题讨论】:

    标签: r tidyverse coalesce


    【解决方案1】:

    使用rowwise

    tibble(var1 = c(1,2,NA,NA,NA), var2 = c(NA,NA,3,4,NA),var3 = c(NA,NA,NA,NA,5)) %>%
      rowwise() %>%
      mutate(source = names(.)[which(!is.na(c_across(var1:var3)))])
    
       var1  var2  var3 source
      <dbl> <dbl> <dbl> <chr> 
    1     1    NA    NA var1  
    2     2    NA    NA var1  
    3    NA     3    NA var2  
    4    NA     4    NA var2  
    5    NA    NA     5 var3
    

    【讨论】:

      【解决方案2】:

      一个选项:

      df$source <- 
        do.call(
          coalesce,
          lapply(seq_len(ncol(df)), function(i) ifelse(is.na(df[[i]]), NA, names(df)[[i]]))    
        )
      # [1] "var1" "var1" "var2" "var2" "var3"
      

      第二个选项(需要 data.table)

      names(df)[sapply(data.table::transpose(df), function(x) match(FALSE, is.na(x)))]
      # [1] "var1" "var1" "var2" "var2" "var3"
      

      第三种纯碱基 R 溶液:

      names(df)[apply(df, 1, function(x) match(FALSE, is.na(x)))]
      # [1] "var1" "var1" "var2" "var2" "var3"
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-02-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-01-17
        • 2017-01-07
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多