将 R 中的多个 .csv 与 R 中的数值和非数值相结合答案

【问题标题】：Combining multiple .csv in R with numeric and non numeric values in R将 R 中的多个 .csv 与 R 中的数值和非数值相结合
【发布时间】：2020-03-09 11:22:37
【问题描述】：

我正在尝试将多个 .csv 文件合并为一个漂亮且简单的脚本。目前，我有代码

data_files = list.files(path=file_source, pattern = "*.csv", full.names = TRUE) %>%
  lapply(read_csv) %>%
  bind_rows

但在检查输出时，它已将某些值替换为 NA。我相信这是因为某些值是非数字的，即 SMITH_201。有没有办法可以避免这种情况，以便保留非数字值？

编辑：

我正在尝试做的一个例子。我有多个 .csv 文件，如下所示

file_A.csv 看起来像这样

x         y
1         1
2         1
3         1
4         1

file_B.csv 看起来像这样

x         y
5         2
6         2
A3        2
A4        1

我想将它们组合成一个 .csv

x         y
1         1
2         1
3         1
4         1
5         2
6         2
A3        2
A4        1

【问题讨论】：

您可以使用col_types 参数使其在列中作为字符读取。见?read_csv
如果您包含一个简单的reproducible example，其中包含可用于测试和验证可能解决方案的示例输入和所需输出，则更容易为您提供帮助。
用read.csv(. , colClasses=c('character'))替换read_csv
@IceCreamToucan 我将如何将col_types 工作到代码中？

标签： r csv dplyr numeric readr

【解决方案1】：

你可以更紧凑一点，使用purrr。

library(purrr)
library(readr)

list.files(path = file_source, pattern = "*.csv", full.names = TRUE) %>%
  map_dfr( ~ read_csv(., col_types = "cn"))

所以这是说两列，第一列是字符，第二列是数字。您也可以选择col_types = "c?"，它会正确地将第二列猜测为数字。来自帮助文件 (?read_csv)：

或者，您可以使用紧凑的字符串表示形式，其中每个字符代表一列：c = 字符，i = 整数，n = 数字，d = 双精度，l = 逻辑，f = 因子，D = 日期，T = 日期时间, t = 时间, ? = 猜测，或 _/- 跳过该列。

如果您不想手动指定列类型，这是第二种方法。

my_files <- list.files(path = file_source, pattern = "*.csv", full.names = TRUE)

file_list <- lapply(my_files, read_lines, skip = 1)
file_header <- read_lines(my_files[1], n_max = 1)

read_csv(c(file_header, unlist(file_list)))

# A tibble: 8 x 2
  x         y
  <chr> <dbl>
1 1         1
2 2         1
3 3         1
4 4         1
5 5         2
6 6         2
7 A3        2
8 A4        1

我们在这里所做的是逐行读取文件，而不是解析为 CSV。然后，我们从第一个文件（标题）中取出第一行以及除第一行之外的所有其他文件的所有内容。然后我们可以将其推送到read_csv，它将正确选择组合文件的数据类型。

【讨论】：