readr - 如何从 spec() 更新 col_spec 对象答案

【问题标题】：readr - how to update col_spec object from spec()readr - 如何从 spec() 更新 col_spec 对象
【发布时间】：2017-01-01 06:52:46
【问题描述】：

我喜欢this RStudio blog post 中描述的有关列规格的工作流程。基本上，可以在 read_csv 导入后获取列规范，然后将其保存下来以备后用。例如，从那个帖子：

mtcars2 <- read_csv(readr_example("mtcars.csv"))
#> Parsed with column specification:
#> cols(
#>   mpg = col_double(),
#>   cyl = col_integer(),
#>   disp = col_double(),
#>   hp = col_integer(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_integer(),
#>   am = col_integer(),
#>   gear = col_integer(),
#>   carb = col_integer()
#> )
# Once you've figured out the correct types
mtcars_spec <- write_rds(spec(mtcars2), "mtcars2-spec.rds")

# Every subsequent load
mtcars2 <- read_csv(
  readr_example("mtcars.csv"), 
  col_types = read_rds("mtcars2-spec.rds")
)

不幸的是，规范对象本身是带有属性的列表，但这些与通过col_types 参数提供给read_csv 函数的不同列规范不匹配

> mtcars_spec$cols$cyl
<collector_integer>
> str(mtcars_spec$cols$cyl)
 list()
 - attr(*, "class")= chr [1:2] "collector_integer" "collector"
> class(mtcars_spec)
[1] "col_spec"

此外，.rds 文件很难在 Windows 中进行编辑（至少对我而言）。

我希望能够编辑一个大的col_spec 对象（例如，跳过某些列，或者以其他方式编辑类）。我可以继续猜测我需要编辑列表的字符串，如下所示：

attr(mtcars_spec$cols$cyl,"class")[1] = "collector_skip"` # this worked!
> mtcars_spec
cols(
  mpg = col_double(),
  cyl = col_skip(),
  disp = col_double(),
  hp = col_integer(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_integer(),
  am = col_integer(),
  gear = col_integer(),
  carb = col_integer()
)

但这似乎很尴尬。有没有更优雅的方法来更新列分类，比如在我的示例中，尝试跳过 mtcars$cyl 列？或者，如果不是一种优雅的方式，一种涵盖所有可能类型的方式？我不想对如何使用各种日期格式实现 <collector_date> 做很多猜测。

【问题讨论】：

仅供参考，我还在 github 上提交了有关此问题的问题：github.com/tidyverse/readr/issues/693

标签： r readr

【解决方案1】：

这是Jim Hester's Github post的最小版本

library(readr)
test_spec <- spec_csv('x,y,theDate,skipCol
  1,a,"21/01/2018", "skip1
  2,z,"31/01/2018", "skip2')

test_spec
#> cols(
#>   x = col_integer(),
#>   y = col_character(),
#>   theDate = col_character(),
#>   skipCol = col_character()
#> )

test_spec$cols[["theDate"]] <- col_date("%d/%m/%Y")
test_spec$cols[["skipCol"]] <- col_skip()

test_spec
#> cols(
#>   x = col_integer(),
#>   y = col_character(),
#>   theDate = col_date(format = "%d/%m/%Y"),
#>   skipCol = col_skip()
#> )

注意事项

您需要知道数据的日期格式。
您可以对文件使用 readr::spec_csv()

【讨论】：