【问题标题】:Extract a duration from character in R从R中的字符中提取持续时间
【发布时间】:2020-04-23 19:58:39
【问题描述】:

我目前面临需要分析的数据集的问题。以下是这些数据的示例:

      session_id    individ_id  colony     species           year_tracked
1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10
2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11
3 12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10
4 12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12
5 12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10
6 12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10

我需要创建一个包含跟踪年数的新列,在这种情况下是:

2010-2009 --> 1
2011-2010 --> 1
2010-2009 --> 1
2012-2010 --> 2
2010-2009 --> 1
2010-2009 --> 1

year_tracked 列是 character 类。也许采用单元格的前 4 个字符和最后两个字符并将其转换为日期的函数可以工作,但我不知道该怎么做。

【问题讨论】:

  • 你能给出预期输出的逻辑吗
  • 刚刚更新了 ;)

标签: r date character


【解决方案1】:

separate 的选项

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
    mutate(year_tracked2 = str_replace(year_tracked, "_", "_20")) %>% 
    separate(year_tracked2, into = c('year1', 'year2'), convert = TRUE) %>%
    mutate(n = year2 - year1) %>%
    select(-year1, -year2)
#       session_id  individ_id    colony         species year_tracked n
#1 12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10 1
#2 12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11 1
#3 12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10 1
#4 12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12 2
#5 12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10 1
#6 12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10 1

或者更简单的选择是将_ 替换为:20,然后执行evaluation

library(purrr)
df1 %>% 
   mutate(n = lengths(map(str_replace(year_tracked, "_", ":20"),
           ~ eval(parse(text = .x))))- 1)

数据

df1 <- structure(list(session_id = c("12141_2009-07-01", "12141_2010-07-18", 
"12143_2009-07-01", "12143_2010-07-18", "12144_2009-07-01", "12145_2009-07-01"
), individ_id = c("GBT_FP96194", "GBT_FP96235", "GBT_FC14766", 
"GBT_FR77883", "GBT_FP05030", "GBT_FA82356"), colony = c("Eynhallow", 
"Eynhallow", "Eynhallow", "Eynhallow", "Eynhallow", "Eynhallow"
), species = c("Northern fulmar", "Northern fulmar", "Northern fulmar", 
"Northern fulmar", "Northern fulmar", "Northern fulmar"), year_tracked = c("2009_10", 
"2010_11", "2009_10", "2010_12", "2009_10", "2009_10")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

【讨论】:

    【解决方案2】:

    这里有一点正则表达式: 先提取第一年有四个数字str_extract(.,"[0-9]{4}"),再提取第二年str_extract(.,"(?&lt;=_)[0-9]{2}"),加上20转换成YYYY格式,再减去两个

    library(magrittr)
    library(stringr)
    
    from <- df$year_tracked %>%
      str_extract(.,"[0-9]{4}") %>%
      as.numeric()
    
    to <- df$year_tracked %>%
      str_extract(.,"(?<=_)[0-9]{2}") %>%
      paste0("20",.) %>%
      as.numeric()
    
    result <- to - from
    
    [1] 1 1 1 2 1 1
    

    数据:

    df <- read.table(text = "      session_id    individ_id  colony     species           year_tracked
     12141_2009-07-01 GBT_FP96194 Eynhallow Northern fulmar      2009_10
     12141_2010-07-18 GBT_FP96235 Eynhallow Northern fulmar      2010_11
     12143_2009-07-01 GBT_FC14766 Eynhallow Northern fulmar      2009_10
     12143_2010-07-18 GBT_FR77883 Eynhallow Northern fulmar      2010_12
     12144_2009-07-01 GBT_FP05030 Eynhallow Northern fulmar      2009_10
     12145_2009-07-01 GBT_FA82356 Eynhallow Northern fulmar      2009_10",header = T)
    

    【讨论】:

      猜你喜欢
      • 2020-12-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-09-08
      • 2018-07-21
      • 1970-01-01
      • 1970-01-01
      • 2018-09-28
      相关资源
      最近更新 更多