【问题标题】:stringr for extract and split用于提取和拆分的 stringr
【发布时间】:2018-07-31 11:56:40
【问题描述】:

我有一堆看起来像这样的行:

 people <- matrix(c("Joe Smith", "Highland (Baltimore, MD)", "Male", "Jane Davis", "Trinity (Albany, NY)", "Female"), ncol = 3, byrow = T)

我正在使用的正则表达式模式是:

cut <- "\\w*\\,\\s.."

该正则表达式模式基本上将第二列减少为仅包括“巴尔的摩,马里兰州”和“奥尔巴尼,纽约州”,因此括号内的所有内容。

然后我想使用 str_split 将城市和州分为两个单独的列,因此最终输出如下所示:

       [,1]         [,2]       [,3]             [,4]
 [1,] "Joe Smith"  "Highland (Baltimore, MD)" "Male"  
 [2,] "Jane Davis" "Trinity (Albany, NY)"     "Female"

      1         2     3   4
1 Joe Smith Baltimore MD Male
2 Jane Davis Albany NY Female

我就是想不通。

【问题讨论】:

    标签: r regex tidyverse stringr


    【解决方案1】:
     library(tidyverse)
    people%>%as.data.frame()%>%mutate(V2=sub(".*\\((.*)\\).*","\\1",people[,2]))%>%
        separate(V2,c("City","State"),",")
              V1      City State     V3
    1  Joe Smith Baltimore    MD   Male
    2 Jane Davis    Albany    NY Female
    

    【讨论】:

      【解决方案2】:

      我们可以通过base R 做到这一点

      res <- trimws(cbind(people[,1], as.matrix(read.csv(text =
          gsub("^\\S+\\s+\\(|\\)", "", people[,2]), sep=",", header = FALSE)), people[,3]))
      colnames(res) <- NULL
      res
      #    [,1]         [,2]        [,3] [,4]    
      #[1,] "Joe Smith"  "Baltimore" "MD" "Male"  
      #[2,] "Jane Davis" "Albany"    "NY" "Female"
      

      【讨论】:

        【解决方案3】:
        people <- matrix(c("Joe Smith", "Highland (Baltimore, MD)", "Male", "Jane Davis", "Trinity (Albany, NY)", "Female"), ncol = 3, byrow = T)
        people<-data.frame(people)
        res<-data.frame(people,stringr::str_split_fixed(people$X2," ",n=2))
        res$X2.1<-gsub(")","",res$X2.1,fixed=TRUE)
        res$X2.1<-gsub("(","",res$X2.1,fixed=TRUE)
        res<-data.frame(people,stringr::str_split_fixed(res$X2.1,",",n=2))
        names(res)<-c("name1","name2","name3","name4","name5")
        res$name2<-NULL
        res
        

        【讨论】:

          【解决方案4】:

          类似于@Onyambu's answer,这个使用extract()而不是mutate() + sub() + separate()的组合:

          library(tidyverse)
          people %>% 
            as.data.frame() %>%
            extract(V2, into = c("City", "State"), regex = ".*\\((.*), (.*)\\)")
          #           V1      City State     V3
          # 1  Joe Smith Baltimore    MD   Male
          # 2 Jane Davis    Albany    NY Female
          

          您也可以使用我的“splitstackshape”包中的cSplit

          library(splitstackshape)
          cSplit(as.data.table(people)[, V2 := gsub(".*\\((.*)\\)", "\\1", V2)], "V2", ",")
          #            V1     V3      V2_1 V2_2
          # 1:  Joe Smith   Male Baltimore   MD
          # 2: Jane Davis Female    Albany   NY
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 2020-04-27
            • 1970-01-01
            • 2013-05-31
            • 2019-02-08
            • 1970-01-01
            • 1970-01-01
            • 2019-11-19
            • 1970-01-01
            相关资源
            最近更新 更多