【问题标题】:How to concatenate multiple columns with separators but ignore some of columns based on condition in R?如何使用分隔符连接多列,但根据 R 中的条件忽略某些列?
【发布时间】:2020-02-21 14:09:13
【问题描述】:

您好,我想将包含字符串或空格或 NA 的列与“;”连接起来。 让我们举个例子:


Actor1<- c("Driver","NA","","")
Actor2<- c("President","Zombie","","")
Actor3<- c("CEO","Devil","","")
Actor4<-c("Priest","","Killer","Mayor")

df_ex <-data.frame(Actor1, Actor2, Actor3, Actor4)

我试过这个:

df_ex %>%
  mutate(combined= paste0(Actor1,";",Actor2,";",Actor3,";",Actor4)) 

但显然结果是错误的,例如:

df_ex[3,]

合并列的结果是这样的: ;;;杀手

我希望结果是: 杀手。

注意:还有 NA 和空白 "" 以及哪些 id 可以忽略。

提前致谢, 干杯

【问题讨论】:

    标签: r concatenation multiple-columns


    【解决方案1】:

    我离成为 专家还很遥远,但我会在这里提出 方法:

    Actor1 <- c("Driver","NA","","")
    Actor2 <- c("President","Zombie","","")
    Actor3 <- c("CEO","Devil","","")
    Actor4 <-c("Priest","","Killer","Mayor")
    
    library(tidyverse)
    
    data.frame(Actor1, Actor2, Actor3, Actor4) %>%
      mutate_all(~str_replace(., pattern = "NA", replacement = "")) %>% 
      unite(col = "combined", sep = ";", remove = F) %>% 
      mutate(combined = str_replace_all(combined, pattern = "^[:punct:]|[:punct:]$|[:punct:]{2,}", replacement = "")) %>% 
      select(-combined, everything(.), combined)
    
    #>   Actor1    Actor2 Actor3 Actor4                    combined
    #> 1 Driver President    CEO Priest Driver;President;CEO;Priest
    #> 2           Zombie  Devil                       Zombie;Devil
    #> 3                         Killer                      Killer
    #> 4                          Mayor                       Mayor
    

    如果你只想要其中的一些列,你可以在unite 中传递它们:

    data.frame(Actor1, Actor2, Actor3, Actor4) %>%
      mutate_all(~str_replace(., pattern = "NA", replacement = "")) %>% 
      unite(Actor2, Actor4, col = "combined", sep = ";", remove = F) %>% 
      mutate(combined = str_replace_all(combined, pattern = "^[:punct:]|[:punct:]$|[:punct:]{2,}", replacement = "")) %>% 
      select(-combined, everything(.), combined)
    
    #>   Actor1    Actor2 Actor3 Actor4         combined
    #> 1 Driver President    CEO Priest President;Priest
    #> 2           Zombie  Devil                  Zombie
    #> 3                         Killer           Killer
    #> 4                          Mayor            Mayor
    

    【讨论】:

      【解决方案2】:
      Actor1<- c("Driver","NA","","")
      Actor2<- c("President","Zombie","","")
      Actor3<- c("CEO","Devil","","")
      Actor4<-c("Priest","","Killer","Mayor")
      
      matrix_ex <-cbind(Actor1, Actor2, Actor3, Actor4)
      #apply(df_ex,1,paste,collapse=";")
      x<-apply(matrix_ex,1,function(x){paste(x[!(is.na(x)|x==""|x=="NA")],collapse=";")})
      x
      
      [1] "Driver;President;CEO;Priest" "Zombie;Devil"                "Killer"                      "Mayor"                                    
      > cat(paste(x,collapse="\n"))
      #Driver;President;CEO;Priest
      #Zombie;Devil
      #Killer
      #Mayor
      
      
      
      

      回答cmets:

      
      df_ex <-data.frame(Actor1=Actor1, Actor2=Actor2, Actor3=Actor3, Actor4=Actor4,rnorm(4))
      
      df_ex$concat<-apply(df_ex[c("Actor1","Actor3")],1,function(x){paste(x[!(is.na(x)|x==""|x=="NA")],collapse=";")})
      df_ex$concat
      
      df_ex$concat2<-apply(df_ex[c(1,3)],1,function(x){paste(x[!(is.na(x)|x==""|x=="NA")],collapse=";")})
      df_ex$concat2
      

      【讨论】:

      • 这正是我想要避免的:由于 na 或空白而导致前面那些不必要的分隔符
      • 还有一个问题:假设我有更多列并且只想连接选定的列,如何将其应用于您的代码?
      • 假设我只想连接 Actor1 和 Actor 3
      • ```selectedcolumns=c(1,3);x
      【解决方案3】:

      你可以试试下面的代码,使用do.call + paste

      df_ex$combine <- gsub("\\bNA;?\\b|;{2,}|;$","",do.call(paste,c(df_ex,sep = ";")))
      

      这样

      > df_ex
        Actor1    Actor2 Actor3 Actor4                     combine
      1 Driver President    CEO Priest Driver;President;CEO;Priest
      2     NA    Zombie  Devil                       Zombie;Devil
      3                         Killer                      Killer
      4                          Mayor                       Mayor
      

      【讨论】:

      • Thomas,在第 2 行末尾还有一个 ;
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-02
      • 2021-01-24
      • 1970-01-01
      相关资源
      最近更新 更多