【发布时间】:2020-07-16 03:31:49
【问题描述】:
readr 中是否有一种读取具有可变长度元数据标头的文本文件的好方法?到目前为止,我正在手动删除标题,但绝对不希望以任何方式更改我的原始数据。
我知道我可以跳过和评论特定行,例如read_delim。但是,这些选项在这里不起作用。我附上了我的一个文本文件的简短示例。元数据以 /* 开头,以 */ 结尾。我想知道是否有一个选项可以跳过所有内容,直到出现 */(如 fread)?我尝试用 fread 读取我的文件,但后来遇到了不同的问题(即列名不能重复的错误消息)。不过,我想我可以用 fread 弄清楚。但我很好奇 readr 中是否有选项。
sample <- c("/* DATA DESCRIPTION:", "Citation:\tName (2015) Title", "Coverage:\tLATITUDE: 44.360000 * LONGITUDE: -26.543333",
"Parameter(s):\tDEPTH, sediment/rock [m] (Depth) * GEOCODE",
"\tAGE [ka BP] (Age) * GEOCODE", "\tGlobigerinella aequilateralis [%] (G. aequilateralis) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction",
"\tGlobigerina bulloides [%] (G. bulloides) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction",
"\tDeuterammina grahami [%] (D. grahami) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction",
"Size:\t8188 data points", "*/", "Depth [m]\tAge [ka BP]\tG. aequilateralis [%] (Counting >150 µm fraction)\tG. bulloides [%] (Counting >150 µm fraction)\tD. grahami [%] (Counting >150 µm fraction)",
"0.0075\t2.23\t0.5\t23.0\t1.5", "0.0550\t3.64\t1.7\t20.8\t1.3",
"0.0850\t4.53\t1.1\t22.3\t3.4")
[1] "/* DATA DESCRIPTION:"
[2] "Citation:\tName (2015) Title"
[3] "Coverage:\tLATITUDE: 44.360000 * LONGITUDE: -26.543333"
[4] "Parameter(s):\tDEPTH, sediment/rock [m] (Depth) * GEOCODE"
[5] "\tAGE [ka BP] (Age) * GEOCODE"
[6] "\tGlobigerinella aequilateralis [%] (G. aequilateralis) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction"
[7] "\tGlobigerina bulloides [%] (G. bulloides) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction"
[8] "\tDeuterammina grahami [%] (D. grahami) * PI: Name * METHOD/DEVICE: Counting >150 µm fraction"
[9] "Size:\t8188 data points"
[10] "*/"
[11] "Depth [m]\tAge [ka BP]\tG. aequilateralis [%] (Counting >150 µm fraction)\tG. bulloides [%] (Counting >150 µm fraction)\tD. grahami [%] (Counting >150 µm fraction)"
[12] "0.0075\t2.23\t0.5\t23.0\t1.5"
[13] "0.0550\t3.64\t1.7\t20.8\t1.3"
[14] "0.0850\t4.53\t1.1\t22.3\t3.4"
【问题讨论】: