【问题标题】:purrr to find smallest value and then label with case_whenpurrr 找到最小值,然后用 case_when 标记
【发布时间】:2019-11-01 14:05:22
【问题描述】:

我有两个数据集。第一个包含城市列表及其与目的地的距离(以英里为单位)。第二个列表包含目的地。我想使用 purrr 将最近目的地的名称放入第一个数据集中的新列中。

这是第一个数据集(包含数据/距离):

library(tidyverse)
data1 <- tibble(city = c("Atlanta", "Tokyo", "Paris"),
                   dist_Rome = c(1000, 2000, 300),
                   dist_Miami = c(400, 3000, 1500),
                   dist_Singapore = c(3000, 600, 2000),
                   dist_Toronto = c(900, 3200, 1900))

这是包含目的地的第二个数据集:

library(tidyverse)
data2 <- tibble(destination = c("Rome Italy", "Miami United States", "Singapore Singapore", "Toronto Canada"))

这就是我想要的样子:

library(tidyverse)
solution <- tibble(city = c("Atlanta", "Tokyo", "Paris"),
                   dist_Rome = c(1000, 2000, 300),
                   dist_Miami = c(400, 3000, 1500),
                   dist_Singapore = c(3000, 600, 2000),
                   dist_Toronto = c(900, 3200, 1900),
                   nearest = c("Miami United States", "Singapore Singapore", "Rome Italy"))

理想情况下,我正在寻找一个整洁的解决方案,我曾尝试用 purrr 来做到这一点,但无济于事。这是我失败的尝试:

library(tidyverse)
solution <- data1 %>%
  mutate(nearest_hub = map(select(., contains("dist")), ~
                                  case_when(which.min(c(...)) ~ data2$destination),
                                TRUE ~ "NA"))
Error in which.min(c(...)) : 
  (list) object cannot be coerced to type 'double'

谢谢!

【问题讨论】:

    标签: r string dplyr purrr pmap


    【解决方案1】:

    我们可以将gather转换成'long'格式,按'city'分组,slice'val'最小的行,left_join'data2'得到'nearest'

    library(tidyverse)
    data1 %>% 
       gather(key, val, starts_with("dist")) %>% 
       group_by(city) %>% 
       slice(which.min(val)) %>% 
       ungroup %>%
       transmute(city, key = str_remove(key, 'dist_')) %>% 
       left_join(data2 %>% 
       mutate(key = word(destination, 1))) %>%
       select(city, nearest = destination) %>% 
       left_join(data1)
    

    【讨论】:

      【解决方案2】:

      使用tidyverse 的解决方案。

      library(tidyverse)
      
      data3 <- data1 %>%
        mutate(City = apply(data1 %>% select(-city), 1, function(x) names(x)[which.min(x)])) %>%
        mutate(City = str_remove(City, "^dist_")) %>%
        left_join(data2 %>%
                    separate(destination, into = c("City", "Country"), sep = " ", remove = FALSE),
                  by = "City") %>%
        select(-City, -Country) %>%
        rename(nearest = destination)
      
      data3
      # # A tibble: 3 x 6
      #   city    dist_Rome dist_Miami dist_Singapore dist_Toronto nearest            
      #   <chr>       <dbl>      <dbl>          <dbl>        <dbl> <chr>              
      # 1 Atlanta      1000        400           3000          900 Miami United States
      # 2 Tokyo        2000       3000            600         3200 Singapore Singapore
      # 3 Paris         300       1500           2000         1900 Rome Italy
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2015-09-24
        • 1970-01-01
        • 1970-01-01
        • 2023-02-21
        • 2022-11-23
        • 1970-01-01
        • 2022-01-19
        相关资源
        最近更新 更多