【问题标题】:How to convert list (with multi elements) into string without turn to "c("xxx","xxx","xxx")" R如何将列表(具有多个元素)转换为字符串而不转为“c(“xxx”,“xxx”,“xxx”)”R
【发布时间】:2019-03-06 03:49:45
【问题描述】:
library(data.table)

# Target string to convert

DATE_DATA <- c("2015-01-02;2015-01-07;2021-05-02;2019-02-05",
"2017-08-02;2000-01-22;2003-03-07;2017-10-09",
"2013-08-02;2022-06-02;2012-03-15")

# Dataset
DT <- data.table(NAME = c("JOE","MARY","PAUL"),DATE = c(DATE_DATA))

预期结果 - 将 DATE 列转换为新列调用“期间”,如下所示: 拆分 + 降序排序 = F + 唯一年份

#  period
1: 2015,2019,2021
2: 2000,2003,2017
3: 2012,2013,2022

像下面这样的方法我没有遇到例外结果

# 1st approach -- RESULT : created column with class -- "list"

DT[,period:= lapply(strsplit(DT$DATE,";"),
                                 function(x) sort(unique(str_sub(x,1,4)),
                                                  decreasing = FALSE))]

# 2nd approach -- RESULT : created column with class -- "character" but value
#                          turn to "c("xxx", "xxx", "xxx")" , not expected 
#                          "xxx,xxx,xxx"

DT[,period:= as.character(paste(lapply(strsplit(DT$DATE,";"),
                             function(x) sort(unique(str_sub(x,1,4)),
                                              decreasing = FALSE)),collapse = ","))]

我错过了哪一步?提前致谢

【问题讨论】:

    标签: r list sorting split data.table


    【解决方案1】:

    对于每个DATE,我们可以拆分“;”上的DATE 列,将它们转换为日期,使用format 提取年份,获取唯一的年份并使用toString 将它们粘贴在一起。

    DT$Period <- sapply(DT$DATE, function(x) 
             toString(sort(unique(format(as.Date(strsplit(x, ";")[[1]]), "%Y")))))
    DT
    
    #   NAME                                        DATE           Period
    #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
    #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
    #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022
    

    我们可以使用 lubridate 包中的 year 函数减少 as.Dateformat 步骤,它提供相同的输出。

    library(lubridate)
    DT$Period <- sapply(DT$DATE, function(x) 
                       toString(sort(unique(year(strsplit(x, ";")[[1]])))))
    

    我不是 data.table 专家,但我认为您在尝试中缺少的是分组 (by) 参数,因为目前它为您提供了整个 DATE 列中唯一的年份,您需要指定by 参数中提到的每一行分别需要unique 年份。

    DT[,period:= paste(sapply(strsplit(DATE,";"),
      function(x) sort(unique(substr(x,1,4)),)),collapse = ","), by = 1:nrow(DT)]
    
    DT
    
    #   NAME                                        DATE         period
    #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015,2019,2021
    #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000,2003,2017
    #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012,2013,2022
    

    【讨论】:

      【解决方案2】:

      我们可以使用gsubscan 来做到这一点

      DT[,  Period := toString(sort(unique(scan(text=gsub("-\\d+", 
                     "", DATE), what = numeric(), sep=";")))), NAME]
      DT
      #   NAME                                        DATE           Period
      #1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
      #2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
      #3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022
      

      或者另一个选项是tidyverse,我们通过将;处的“日期”拆分为“长”格式,按“名称”、summarise分组,将“期间”作为sorted @转换后的 Date 类 (ymd) 的 987654328@,与原始数据集和 select 以适当的顺序连接列(如果需要)

      library(tidyverse)
      DT %>% 
         separate_rows(DATE, sep = ";") %>% 
         group_by(NAME) %>% 
         summarise(Period = toString(sort(unique(year(ymd(DATE)))))) %>% 
         right_join(DT) %>%
         select(names(DT), everything())
      # A tibble: 3 x 3
      #  NAME  DATE                                        Period                
      #  <chr> <chr>                                       <chr>                 
      #1 JOE   2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
      #2 MARY  2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
      #3 PAUL  2013-08-02;2022-06-02;2012-03-15            2012, 2013, 2022    
      

      【讨论】:

        【解决方案3】:

        我不确定最快的方法是什么,但一种相对容易阅读和理解的方法是:

        DT[, period:=sapply(strsplit(DATE, ";"), 
             function(x) paste(sort(unique(year(as.Date(x)))), collapse = ","))]
        

        结果输出为:

           NAME                                        DATE         period
        1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015,2019,2021
        2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000,2003,2017
        3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012,2013,2022
        

        strsplit(DATE, ";") 会给你一列类型列表。这意味着您可以将 lapply 函数应用于此列,它将获取每一行并对其应用一些函数。那么这只是如何将日期的字符向量转换为排序年份的问题

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2013-03-19
          • 1970-01-01
          • 2018-11-17
          • 1970-01-01
          • 1970-01-01
          • 2020-01-02
          • 2022-10-01
          • 2019-12-12
          相关资源
          最近更新 更多