同时导入多个文件并添加ID指标答案

【问题标题】：Importing many files at the same time and adding ID indicator同时导入多个文件并添加ID指标
【发布时间】：2018-08-14 09:29:19
【问题描述】：

我有 91 个文件 - .log 格式：

rajectory Log File

Rock type: 2 (0: Sphere, 1: Cuboid, 2: Rock)

Nr of Trajectories: 91
Trajectory-Mode: ON
Average Slope (Degrees): 28.05 / 51.99 / 64.83

Filename: test_tschamut_Pos1.xml

Z-offset: 1.32000
Rock Position X: 696621.38
Rock Position Y: 167730.02
Rock Position Z: 1679.6400

Friction:
Overall Type: Medium

               t (s)               x (m)               y (m)               z (m)               p0 ()               p1 ()               p2 ()               p3 ()          vx (m s-1)          vy (m s-1)          vz (m s-1)        wx (rot s-1)        wy (rot s-1)        wz (rot s-1)           Etot (kJ)           Ekin (kJ)      Ekintrans (kJ)        Ekinrot (kJ)              zt (m)             Fv (kN)             Fh (kN)        Slippage (m)      mu_s (N s m-1)       v_res (m s-1)     w_res (rot s-1)           JumpH (m)        ProjDist (m)               Jc ()           JH_Jc (m)              SD (m)
               0.000          696621.380          167730.020            1680.960               1.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000               0.000            1192.526               0.000               0.000               0.000            1677.754               0.000               0.000               0.000               0.350               0.000               0.000               3.206               0.000               0.000               0.000               0.000
               0.010          696621.380          167730.020            1680.959               1.000               0.000              -0.000               0.000               0.000               0.000              -0.098               0.000               0.000               0.000            1192.526               0.010               0.010               0.000            1677.754               0.000               0.000               0.000               0.350               0.098               0.000               3.205               0.000               0.000               0.000               0.000
               0.020          696621.380          167730.020            1680.958               1.000               0.000              -0.000               0.000               0.000               0.000              -0.196               0.000               0.000               0.000            1192.526               0.039               0.039               0.000            1677.754               0.000               0.000               0.000               0.350               0.196               0.000               3.204               0.000               0.000               0.000               0.000
               0.040          696621.380          167730.020            1680.952               1.000               0.000              -0.000               0.000               0.000               0.000              -0.392               0.000               0.000               0.000            1192.526               0.158               0.158               0.000            1677.754               0.000               0.000               0.000               0.350               0.392               0.000               3.198               0.000               0.000               0.000               0.000
               0.060          696621.380          167730.020            1680.942               1.000               0.000              -0.000               0.000               0.000               0.000              -0.589               0.000               0.000               0.000            1192.526               0.355               0.355               0.000            1677.754               0.000               0.000               0.000               0.350               0.589               0.000               3.188               0.000               0.000               0.000               0.000

我已经成功地导入了一个文件，并且只保留了所需的变量：x、y、z、Etot：

  trjct <- read.table('trajectory_test_tschamut_Pos1.log', skip = 23)
  trjct <- trjct[,c("V1","V2","V3", "V4", "V15")]
  colnames(trjct) <- c("t", "x", "y", "z", "Etot")

> str(trjct)
'data.frame':   1149 obs. of  5 variables:
 $ t   : num  0 0.01 0.02 0.04 0.06 0.08 0.11 0.13 0.15 0.16 ...
 $ x   : num  696621 696621 696621 696621 696621 ...
 $ y   : num  167730 167730 167730 167730 167730 ...
 $ z   : num  1681 1681 1681 1681 1681 ...
 $ Etot: num  1193 1193 1193 1193 1193 ...

但是，我有 91 个这样的文件，并希望同时分析它们。因此，我想创建一个大型数据集，通过添加 ID 来区分每个文件中的数据 - 类似的问题已得到回答 here。

我已将代码应用于我的数据和需求，并在这里和那里进行了调整，但我总是遇到一些错误。

# importing all files at the same time
  files.list <- list.files(pattern = ".log")
  trjct <- data.frame(t=numeric(),
                      x=numeric(),
                      z=numeric(),
                      Etot=numeric(),
                      stringsAsFactors=FALSE)

  for (i in 1: length(files.list)) {
    df.next <- read.table(files.list[[i]], header=F, skip = 23)
    df.next$ID <- paste0('simu', i)
    df <- rbind(df, df.next)
  }

我收到一个错误：

Error in rep(xi, length.out = nvar) : 
  attempt to replicate an object of type 'closure'

问题：

问题出在哪里，我该如何解决？
有更好的解决方案吗？

【问题讨论】：

尝试将上面的 df 替换为 trjct 或其他方式。
当然！ @hpesoj626 谢谢...我希望我没有花 20 分钟写这个问题：D
没问题。您可能还想考虑答案中的建议。

标签： r for-loop dataframe import read.table

【解决方案1】：

您还可以查看purrr::map_df，它的行为类似于 lapply 或 for 循环，但返回一个 data.frame

read_traj <- function(fi) {
    df <- read.table(fi, header=F, skip=23)
    df <- df[, c(1:4, 15)]
    colnames(df) <- c("t", "x", "y", "z", "Etot")
    return(df)
}

files.list <- list.files(pattern = ".log")
library(tidyverse)

map_df 有一个方便的功能.id=...，它可以创建一个列id，其编号为1...N，其中N 是文件数。

map_df(files.list, ~read_traj(.x), .id="id")

如果您想保存文件名，请使用id 列访问files.list

map_df(files.list, ~read_traj(.x), .id="id") %>%
  mutate(id = files.list[as.numeric(id)])

【讨论】：

【解决方案2】：

首先，你应该将阅读部分封装在一个函数中：

read_log_file <- function(path) {
  trjct <- read.table(path, skip = 23)
  trjct <- trjct[,c("V1","V2","V3", "V4", "V15")]
  colnames(trjct) <- c("t", "x", "y", "z", "Etot")
  return(trjct)
}

然后，您可以使用 mapply 创建一个 data.frame 列表（一种可以带两个参数的应用，如果您想了解更多信息，请转到 datacamp 关于应用系列的文章）。

files.list <- list.files(pattern = ".log")
ids <- 1:length(files.list)

df_list = mapply(function(path, id) {
    df = read_log_file(path)
    df$ID = id
    return(df)
}, files.list, ids, SIMPLIFY=FALSE)

注意SIMPLIFY=FALSE 部分，它避免了 mapply 返回一种 data.frame 并返回 data.frame 的原始列表。

最后，您可以将所有 data.frame 与 dplyr 包中的 bind_rows 连接在一起：

df = dplyr::bind_rows(df_list)

注意：通常，在 R 中，最好使用 *apply 函数族。

【讨论】：