【问题标题】:Using function mutate_at isn't iterating over the function as expected [duplicate]使用函数 mutate_at 没有按预期迭代函数[重复]
【发布时间】:2019-09-25 05:30:04
【问题描述】:

我有一列希望将其转换为秒。转换功能有效,但是当我尝试使用 mutate_at 迭代多个列时。它没有按我的预期工作。我不知道我在 mutate_at 语法中缺少什么。

我有这个:

catalog
# A tibble: 4 x 3
#  file                              start end  
#  <chr>                             <chr> <chr>
#1 20190506_205959-20190506_210459   1:58  3:00 
#2 20190506_210507-20190506_211007   0     0:32 
#3 20190506_205959-20190506_210459_2 0     3:18 
#4 20190506_220712-20190506_221210   0     5  

transform_time_to_seconds <- function(x) {
    x %>% 
        str_split(":", simplify = TRUE) %>% 
        as.numeric() %>% 
        {.[1] * 60 + 
         ifelse(is.na(.[2]), 0, .[2])}
}

我申请mutate_at

catalog %>%
    mutate_at(vars(start, end), transform_time_to_seconds)
# A tibble: 4 x 3
#  file                              start   end
#  <chr>                             <dbl> <dbl>
#1 20190506_205959-20190506_210459      60   180
#2 20190506_210507-20190506_211007      60   180
#3 20190506_205959-20190506_210459_2    60   180
#4 20190506_220712-20190506_221210      60   180

但我期望的是:

catalog %>%
    mutate(start = map_dbl(start, transform_time_to_seconds),
           end   = map_dbl(end, transform_time_to_seconds))
# A tibble: 4 x 3
#  file                              start   end
#  <chr>                             <dbl> <dbl>
#1 20190506_205959-20190506_210459     118   180
#2 20190506_210507-20190506_211007       0    32
#3 20190506_205959-20190506_210459_2     0   198
#4 20190506_220712-20190506_221210       0   300

有什么建议吗?


catalog数据:

structure(list(file = c("20190506_205959-20190506_210459", "20190506_210507-20190506_211007", 
"20190506_205959-20190506_210459_2", "20190506_220712-20190506_221210"
), start = c("1:58", "0", "0", "0"), end = c("3:00", "0:32", 
"3:18", "5")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), spec = structure(list(cols = list(
    file = structure(list(), class = c("collector_character", 
    "collector")), start = structure(list(), class = c("collector_character", 
    "collector")), end = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))
`` 

【问题讨论】:

  • 似乎问题出在矩阵中,正如您的str_split 返回的那样。通过在mutate_at 之前添加rowwise 来查看这一点:这可以获得您的预期输出。我认为这是因为mutate_* 函数期望同时沿整个向量工作,而这会被矩阵抛出。
  • catalog %&gt;% group_by(1:n()) %&gt;% mutate_at(vars(start, end), transform_time_to_seconds) %&gt;% select(-4)

标签: r dplyr


【解决方案1】:

你也可以vectorize你的函数

transform_time_to_seconds <- Vectorize(transform_time_to_seconds)

【讨论】:

  • 这非常聪明,我相信如果计算成本受到威胁,这将使我们受益最大。 +1
【解决方案2】:

当您传递一整列时,您的函数一次需要一个值。

添加 rowwise 可能会有所帮助

library(dplyr)

catalog %>%
  rowwise() %>%
  mutate_at(vars(start, end), transform_time_to_seconds)

# A tibble: 4 x 3
#  file                              start   end
#  <chr>                             <dbl> <dbl>
#1 20190506_205959-20190506_210459     118   180
#2 20190506_210507-20190506_211007       0    32
#3 20190506_205959-20190506_210459_2     0   198
#4 20190506_220712-20190506_221210       0   300

【讨论】:

    猜你喜欢
    • 2018-09-03
    • 1970-01-01
    • 1970-01-01
    • 2011-09-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多