【问题标题】:Unable to unnest list data frame with different column types无法取消嵌套具有不同列类型的列表数据框
【发布时间】:2020-03-30 22:45:57
【问题描述】:

我从封装在 R 包中的 API 中提取一些道路交通数据。我正在使用列表数据框来控制多组记录的下载。

# install.packages(webTRISr)
library(webTRISr)
library(tidyverse)

sites <- c(5745, 6345)
start_date = '01112017'
end_date = '31122017'

road_reports <- data_frame(sites, start_date, end_date) %>% 
  mutate(data = purrr::pmap(list(sites, start_date, end_date), webTRISr::webtris_report, report_type = "daily"))

当我来到unnest 结果...

road_reports %>% 
  unnest(data)
# Error: No common type for `..1$data$Site Name` <character> and `..2$data$Site Name` <double>.

这是因为“站点名称”列在 API 的一次调用中是一个字符,但在另一个调用中是双精度。

从这个已关闭的tidyr 问题 (https://github.com/tidyverse/tidyr/issues/658) 我认为这已被视为错误并已在tidyr v1.0.0 中排序。

有什么解决办法吗? this SO answer 的解决方案给出了同样的错误。

我尝试将 ptype 参数传递给 unnest() 以强制数据类型,但得到有损转换错误,即:

ptype <- data_frame('Site Name'= character(),
                'Report Date' = as.POSIXct(character(), tz = "UTC"),
                'Time Period Ending' = hms::as_hms(character()),
                'Time Interval' = double(),
                '0 - 520 cm' = double(),
                '521 - 660 cm' = double(),
                '661 - 1160 cm' = double(),
                '1160+ cm' = double(),
                '0 - 10 mph' = logical(),
                '11 - 15 mph' = logical(),
                '16 - 20 mph' = logical(),
                '21 - 25 mph' = logical(),
                '26 - 30 mph' = logical(),
                '31 - 35 mph' = logical(),
                '36 - 40 mph' = logical(),
                '41 - 45 mph' = logical(),
                '46 - 50 mph' = logical(),
                '51 - 55 mph' = logical(),
                '56 - 60 mph' = logical(),
                '61 - 70 mph' = logical(),
                '71 - 80 mph' = logical(),
                '80+ mph' = logical(),
                'Avg mph' = double(),
                'Total Volume' = double()
                )

road_reports %>% 
  unnest(data, ptype = ptype)

#Error: Lossy cast from <data.frame<data:data.frame< Site Name : character Report Date : datetime<UTC> Time Period Ending: time Time Interval : double
.
.
.

【问题讨论】:

    标签: r tidyr unnest


    【解决方案1】:

    一种选择是转换为通用类型,然后执行unnest,然后使用type.convert 更改类型

    library(purrr)
    library(dplyr)
    road_reports %>% 
        mutate(data = map(data, ~ .x %>% 
                  mutate_all(as.character))) %>% 
        unnest(data) %>%
        type.convert
        # type.convert(., as.is = TRUE) # to avoid getting factor columns
    # A tibble: 11,232 x 27
    #   sites start_date end_date `Site Name` `Report Date` `Time Period En… `Time Interval` `0 - 520 cm` `521 - 660 cm` `661 - 1160 cm` `1160+ cm`
    #   <int>      <int>    <int> <fct>       <fct>         <fct>                      <int>        <int>          <int>           <int>      <int>
    # 1  5745    1112017 31122017 M1/5170L    2017-11-01    00:14:59                       0           NA             NA              NA         NA
    # 2  5745    1112017 31122017 M1/5170L    2017-11-01    00:29:59                       1           NA             NA              NA         NA
    # 3  5745    1112017 31122017 M1/5170L    2017-11-01    00:44:59                       2           NA             NA              NA         NA
    # 4  5745    1112017 31122017 M1/5170L    2017-11-01    00:59:59                       3           NA             NA              NA         NA
    # 5  5745    1112017 31122017 M1/5170L    2017-11-01    01:14:59                       4           NA             NA              NA         NA
    # 6  5745    1112017 31122017 M1/5170L    2017-11-01    01:29:59                       5           NA             NA              NA         NA
    # 7  5745    1112017 31122017 M1/5170L    2017-11-01    01:44:59                       6           NA             NA              NA         NA
    # 8  5745    1112017 31122017 M1/5170L    2017-11-01    01:59:59                       7           NA             NA              NA         NA
    # 9  5745    1112017 31122017 M1/5170L    2017-11-01    02:14:59                       8           NA             NA              NA         NA
    #10  5745    1112017 31122017 M1/5170L    2017-11-01    02:29:59                       9           NA             NA              NA         NA
    # … with 11,222 more rows, and 16 more variables: `0 - 10 mph` <int>, `11 - 15 mph` <int>, `16 - 20 mph` <int>, `21 - 25 mph` <int>, `26 - 30
    #   mph` <int>, `31 - 35 mph` <int>, `36 - 40 mph` <int>, `41 - 45 mph` <int>, `46 - 50 mph` <int>, `51 - 55 mph` <int>, `56 - 60 mph` <int>, `61 -
    #   70 mph` <int>, `71 - 80 mph` <int>, `80+ mph` <int>, `Avg mph` <int>, `Total Volume` <int>
    

    或者从readr使用type_convert

    【讨论】:

    • 酷@akrun。我不知道type.convert。我会将您的帖子调整为 type.convert(as.is = TRUE) 以消除因素。
    猜你喜欢
    • 2016-12-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-10
    • 1970-01-01
    • 1970-01-01
    • 2018-07-10
    • 2019-09-09
    相关资源
    最近更新 更多