使用 read_cvs() 将数据读入 R [重复]答案

【问题标题】：read data into R, using read_cvs() [duplicate]使用 read_cvs() 将数据读入 R [重复]
【发布时间】：2021-03-26 16:22:18
【问题描述】：

我需要使用read_csv() 将数据读入R，从list_files() 开始创建一个具有文件全名的向量，然后将一个tibble 设置为NULL 并循环遍历我使用list_files 创建的向量中的条目, 每次读取文件并使用 bind_rows() 附加，结果 tibble 设置为 NULL 包含已读取的数据。

我将如何在 R 中按照我上面描述的方式执行此操作？

【问题讨论】：

我不确定你在这里问什么。 RStudio 只是一个用于运行 R 的 IDE。无论您拥有什么 R 代码，都应该在 RStudio 中正常工作。不会有什么不同。如果您包含一个简单的reproducible example，其中包含可用于测试和验证可能解决方案的示例输入和所需输出，则更容易为您提供帮助。

标签： r

【解决方案1】：

你不需要创建一个 NULL tibble 来做到这一点。假设 csv 具有相同的列名，您可以按照以下步骤进行操作，只需将“文件夹/with/csv/data”更改为 PC 中存储数据的文件夹的路径即可。

library(tidyverse)

#First create a vector with your csv paths
vector <- list.files("folder/with/csv/data", pattern = "*.csv", full.names = TRUE)

#Then you can read all the csv in a list
list <- map(vector, function(f) 
  read_csv(f)
)

#And finally you can bind the list of tables in a single one
data <- do.call(rbind, list)

#You can also write all the above in a pipe
data <- list.files("folder/with/csv/data", pattern = "*.csv", full.names = TRUE) %>% 
  map(., function(f) read_csv(f)) %>% 
  do.call(rbind, .)

【讨论】：

是否可以通过创建 NULL tibble 来做到这一点，如果可以，我将如何做到这一点
以及循环文件

【解决方案2】：

解决您的问题的可选方法是这样的（如果我理解正确的话）：

library(stringr)
library(purrr)
library(io)
library(readr)

# path to your csv files folder 
path <- "C:/your_path_to_files_folder"

# get all file in the path folder and set full.names to TRUE to get not only the name but the whole path
file_vector <-  io::list_files(path = path, full.names = TRUE)

# just to be sure you are using only the ".csv" files in case something else is in the directory
file_vector_csvs <- file_vector[stringr::str_detect(file_vector, ".csv")]

# use map instead of for-loop => result is a list (also a new column with the original filename is generated as it might be important for later filtering/use)
purrr::map(file_vector_csvs, ~readr::read_csv(.x) %>% dplyr::mutate(file = .x))

# use map_df to get a data.frame (which is tabular and can be converted to a tibble directly) (also a new column with the original filename is generated as it might be important for later filtering/use)
purrr::map_df(file_vector_csvs, ~readr::read_csv(.x) %>% dplyr::mutate(file = .x))

对于最后一次调用，所有 .csv 文件都应该具有相同的列和列位置 - 这可能是也可能不是您的情况

【讨论】：