【问题标题】:Split a column consisting of number range and use the resulting numbers as range values in R [duplicate]拆分由数字范围组成的列并将结果数字用作R中的范围值[重复]
【发布时间】:2020-08-18 19:25:16
【问题描述】:

我的示例数据框如下所示:

structure(list(Speed = c("0-20", "21-40", "41-60", "61-80", "81-100"
), SpeedLevel = c(1, 2, 3, 4, 5)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))
> 

我需要添加一个列,其中包含与第一列“速度”相对应的范围内的所有值。 即,我需要在 '-' 处拆分字符串,并给出从 min 到 max 的值范围。

例如,在 Speed 列的第一行中,我们有“0-20”,因此在拆分后范围将是 0 到 20 之间的所有数字。一旦我得到了,我就可以使用 tidyr 的单独行或 unnest 函数和dplyr 分别如下面的预期输出所示。

预期输出:

structure(list(Speed = c("0-20", "0-20", "0-20", "0-20", "0-20", 
"0-20", "0-20", "0-20", "0-20", "0-20", "0-20", "0-20", "0-20", 
"0-20", "0-20", "0-20", "0-20", "0-20", "0-20", "0-20", "0-20", 
"21-40", "21-40", "21-40", "21-40", "21-40", "21-40", "21-40", 
"21-40", "21-40", "21-40", "21-40", "21-40", "21-40", "21-40", 
"21-40", "21-40", "21-40", "21-40", "21-40", "21-40", "41-60", 
"41-60", "41-60", "41-60", "41-60", "41-60", "41-60", "41-60", 
"41-60", "41-60", "41-60", "41-60", "41-60", "41-60", "41-60", 
"41-60", "41-60", "41-60", "41-60", "41-60", "61-80", "61-80", 
"61-80", "61-80", "61-80", "61-80", "61-80", "61-80", "61-80", 
"61-80", "61-80", "61-80", "61-80", "61-80", "61-80", "61-80", 
"61-80", "61-80", "61-80", "61-80", "81-100", "81-100", "81-100", 
"81-100", "81-100", "81-100", "81-100", "81-100", "81-100", "81-100", 
"81-100", "81-100", "81-100", "81-100", "81-100", "81-100", "81-100", 
"81-100", "81-100", "81-100"), SpeedLevel = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), ActualSpeed = c(0, 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
100)), row.names = c(NA, -101L), class = c("tbl_df", "tbl", "data.frame"
))
> 

对于字符串拆分,我使用“strsplit”函数,但不确定我是否可以在这里使用它。有人可以告诉我如何拆分“速度”列并将两个结果数字用作范围值。

【问题讨论】:

    标签: r split


    【解决方案1】:

    我们可以用separate将'Speed'分成两列,然后根据'start'、'end'和unnest列的值创建一个序列list列和list

    library(dplyr)
    library(tidyr)
    library(purrr)
    df1 %>% 
      separate(Speed, into = c('start', 'end'), remove = FALSE, convert = TRUE) %>% 
       mutate(AcutalSpeed  = map2(start, end, `:`), start = NULL, end = NULL) %>% 
       unnest(c(AcutalSpeed))
    # A tibble: 101 x 3
    #   Speed SpeedLevel AcutalSpeed
    #   <chr>      <dbl>       <int>
    # 1 0-20           1           0
    # 2 0-20           1           1
    # 3 0-20           1           2
    # 4 0-20           1           3
    # 5 0-20           1           4
    # 6 0-20           1           5
    # 7 0-20           1           6
    # 8 0-20           1           7
    # 9 0-20           1           8
    #10 0-20           1           9
    # … with 91 more rows
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-03-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-07-13
      • 1970-01-01
      • 2016-06-06
      相关资源
      最近更新 更多