【问题标题】:Why are the NAs being ignored while I'm using the ifelse/mutate functions?为什么在我使用 ifelse/mutate 函数时忽略 NA?
【发布时间】:2019-12-03 02:33:56
【问题描述】:

所以我有一个数据框,其中包含多个不同物种的出现和一个“new_name”空列,我想用 mutate/ifelse 填充。 基本上我希望根据这些条件填写 new_name: 如果状态是 unaccepter,我希望 new_name 是“valid_name”的值,如果状态被接受或 NA,我希望 new_name 采用“species”的值。 这是我的数据框结构的示例: '''

数据框示例

         species           valid_name                 new_name    status
1.  Tilapia guineensis |         NA                 |  NA       | NA

2.     Tilapia zillii  |  Hippocampus trimaculatus  |  NA       | unaccepted

3. Fundulus rubrifrons |  Hippocampus trimaculatus  |  NA       | unaccepted

4.  Eutrigla gurnardus |  Bougainvillia supercili   |  NA       | accepted

5.   Sprattus sprattus |        NA                  |  NA       | NA

6.        Gadus morhua |  Aglantha digitale         |  NA       | accepted

´´´

到目前为止,我尝试了以下方法:

df<-df%>%
  mutate(new_name = ifelse(status=="unaccepted",valid_name,ifelse(status=="accepted" | is.na(status),species,NA)))

因此,此代码仅适用于没有 NA 的“状态”值。否则它只会忽略 NA 并且什么也不做。所以数据框变成了这样:

             species           valid_name                 new_name    status
    1.  Tilapia guineensis |         NA                 |  Tilapia guineensis             | NA
    
    2.     Tilapia zillii  |  Hippocampus trimaculatus  |  Hippocampus trimaculatus   | unaccepted
    
    3. Fundulus rubrifrons |  Hippocampus trimaculatus  |  Hippocampus trimaculatus   | unaccepted
    
    4.  Eutrigla gurnardus |  Bougainvillia supercili   |  Eutrigla gurnardus         | accepted
    
    5.   Sprattus sprattus |        NA                  |  Sprattus sprattus             | NA
    
    6.        Gadus morhua |  Aglantha digitale         |  Gadus morhua               | accepted

提前感谢您的任何回答

【问题讨论】:

    标签: r


    【解决方案1】:

    如果我们使用==,请确保还添加is.na 以返回 TRUE/FALSE,否则,NA 将保持为 NA

    library(dplyr)
    df%>%
      mutate(new_name = ifelse(status=="unaccepted" & !is.na(status),valid_name,
               ifelse(status=="accepted" & !is.na(status),species,species)))
    #      species               valid_name     status                 new_name
    #1  Tilapia guineensis                     <NA>       <NA>       Tilapia guineensis
    #2      Tilapia zillii Hippocampus trimaculatus unaccepted Hippocampus trimaculatus
    #3 Fundulus rubrifrons Hippocampus trimaculatus unaccepted Hippocampus trimaculatus
    #4  Eutrigla gurnardus  Bougainvillia supercili   accepted       Eutrigla gurnardus
    #5   Sprattus sprattus                     <NA>       <NA>        Sprattus sprattus
    #6        Gadus morhua        Aglantha digitale   accepted             Gadus morhua
    

    另一种选择是使用%in%,它将为 NA 返回 FALSE

    df%>%
      mutate(new_name = ifelse(status %in% "unaccepted" ,valid_name,
               ifelse(status %in% "accepted",species, species)))
    

    使用可重现的示例

    v1 <- c('a', 'b', NA)
    v1 == 'a'
    #[1]  TRUE FALSE    NA  ####
    
    v1 %in% 'a'
    #[1]  TRUE FALSE FALSE
    

    数据

    df <- structure(list(species = c("Tilapia guineensis", "Tilapia zillii", 
    "Fundulus rubrifrons", "Eutrigla gurnardus", "Sprattus sprattus", 
    "Gadus morhua"), valid_name = c(NA, "Hippocampus trimaculatus", 
    "Hippocampus trimaculatus", "Bougainvillia supercili", NA, 
    "Aglantha digitale"
    ), status = c(NA, "unaccepted", "unaccepted", "accepted", NA, 
    "accepted")), class = "data.frame", row.names = c(NA, -6L))
    

    【讨论】:

    • 感谢您的回答,当状态为 NA 时,我希望“新名称”获取“名称”列的数据,但我无法做到。 “new_name”列完全按照我想要的方式填充,除非状态为 NA。
    • 你是对的,很抱歉我给出的输出是磨损的,我会澄清
    【解决方案2】:

    我想提供一个使用dplyr 中的case_when 的替代方法,它提供了一种很好且直观的语法:

    library(dplyr)
    df <- structure(list(species = c("Tilapia guineensis", "Tilapia zillii", 
                                                                     "Fundulus rubrifrons", "Eutrigla gurnardus", "Sprattus sprattus", 
                                                                     "Gadus morhua"), valid_name = c(NA, "Hippocampus trimaculatus", 
                                                                                                                                    "Hippocampus trimaculatus", "Bougainvillia supercili", NA, 
                                                                                                                                    "Aglantha digitale"
                                                                     ), status = c(NA, "unaccepted", "unaccepted", "accepted", NA, 
                                                                                                "accepted")), class = "data.frame", row.names = c(NA, -6L))
    
    df <- df %>% 
        mutate(new_name = case_when(
            status == "unaccepted" ~ valid_name,
            status == "accepted" | is.na(status) ~ species
        ))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-03-03
      • 2012-11-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-11-19
      • 1970-01-01
      • 2016-11-28
      相关资源
      最近更新 更多