【问题标题】:How to split but ignore separators in quoted strings in R?如何拆分但忽略R中引用字符串中的分隔符?
【发布时间】:2016-04-21 13:39:40
【问题描述】:

我正在拆分用逗号分隔的字符串,但是,我想忽略引号之间的逗号。这是一个例子:

library(data.table)
dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                            "DUSA,DUSA Pharmaceuticals Inc,Q"))

#1   USATW,"USA Technologies, Inc Warrants",Q
#2   DUSA,DUSA Pharmaceuticals Inc,Q

setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                    tstrsplit(str, ",", fixed=TRUE)]


#   Symbol    Security Name               Market Category
#1  USATW    "USA Technologies            Inc Warrants"
#2  DUSA      DUSA Pharmaceuticals Inc    Q

第一个字符串应该是:

#1  USATW    "USA Technologies, Inc Warrants"  Q

有类似的帖子,但使用其他编程语言。

【问题讨论】:

    标签: r split data.table


    【解决方案1】:

    试试read.table。不需要任何软件包。

    read.table(text = as.character(dataset$str), sep = ",", as.is = TRUE,   
      col.names = c("Symbol", "Security Name", "Market Category"), check.names = FALSE)
    

    给予:

      Symbol                  Security Name Market Category
    1  USATW USA Technologies, Inc Warrants               Q
    2   DUSA       DUSA Pharmaceuticals Inc               Q
    

    【讨论】:

    • 对,但我认为 op 出于某种原因想要保留转义的引号
    • 你也可以使用fread,如果你在换行符中添加:fread(paste(dataset$str, collapse = '\n'), header = F)
    【解决方案2】:

    this regex 将用逗号分隔并保留引号

    library(data.table)
    dataset <- data.frame(str=c("USATW,\"USA Technologies, Inc Warrants\",Q" ,
                                "DUSA,DUSA Pharmaceuticals Inc,Q"))
    
    setDT(dataset)[, c("Symbol","Security Name","Market Category") :=
                     tstrsplit(str, '(,)(?=(?:[^"]|"[^"]*")*$)', perl = TRUE)]
    
    #                                         str Symbol                    Security Name Market Category
    # 1: USATW,"USA Technologies, Inc Warrants",Q  USATW "USA Technologies, Inc Warrants"               Q
    # 2:          DUSA,DUSA Pharmaceuticals Inc,Q   DUSA         DUSA Pharmaceuticals Inc               Q
    

    【讨论】:

      猜你喜欢
      • 2011-02-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-09-21
      • 2010-12-17
      相关资源
      最近更新 更多