【问题标题】:Read CSV file with delim in the quote using read_csv使用 read_csv 读取报价中带有 delim 的 CSV 文件
【发布时间】:2021-10-24 07:41:07
【问题描述】:

csv 文件在引号中包含逗号 (,)。 read_csv 函数将它们转换为 numeric 数字,假设保持为 character

library(readr)
read_csv('"Name","V1","V2"\n
"A","0,20","300,200"\n
"B","0,20","300,200"')

结果看起来像

# A tibble: 2 x 3
  Name  V1        V2
  <chr> <chr>  <dbl>
1 A     0,20  300200
2 B     0,20  300200

我希望 V2 列与字符保持相同。

我应该如何解决它?

我的会话信息

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252 
[2] LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] readr_1.4.0

loaded via a namespace (and not attached):
 [1] fansi_0.5.0     utf8_1.2.2      crayon_1.4.1   
 [4] R6_2.5.0        lifecycle_1.0.0 magrittr_2.0.1 
 [7] pillar_1.6.1    rlang_0.4.11    cli_3.0.1      
[10] rstudioapi_0.13 vctrs_0.3.8     ellipsis_0.3.2 
[13] tools_4.1.0     hms_1.1.0       compiler_4.1.0 
[16] pkgconfig_2.0.3 tibble_3.1.3

【问题讨论】:

    标签: r tidyverse readr


    【解决方案1】:

    两个选项 -

    1. locale 中的grouping_mark 传递给数据中不存在的内容。
    library(readr)
    
    read_csv('"Name","V1","V2"\n
    "A","0,20","300,200"\n
    "B","0,20","300,200"', locale = locale(grouping_mark = "@"))
    
    #  Name  V1    V2     
    #  <chr> <chr> <chr>  
    #1 A     0,20  300,200
    #2 B     0,20  300,200
    
    1. 显式传递列类。
    read_csv('"Name","V1","V2"\n
    "A","0,20","300,200"\n
    "B","0,20","300,200"', col_types = 'ccc')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-05-31
      • 2019-06-10
      • 1970-01-01
      • 2022-01-19
      • 2015-12-11
      • 2016-10-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多