从 R 中的 txt 导入特定数据答案

【问题标题】：Import specific data from a txt in R从 R 中的 txt 导入特定数据
【发布时间】：2017-02-01 21:37:44
【问题描述】：

我有一个从仪器生成的文件 (Map_1.hdr)，这里是文件：

    ENVI
    description = {ROI id #1}
    samples = 16
    lines   = 4
    bands   = 1025
    data type = 4
    interleave = bip
    wavelength = 
    pixel size = {9.38E-07, 7.5E-07}
    x-start and y-start = {0.027363358, -0.007902135}

我需要从最后 2 行获取特定数据，这些数据：

pixel_size = c(9.38E-07,7.5E-07)
origin = (0.027363358, -0.007902135)

这是我的（不完整的）尝试：

library(R.utils)
rem <- 2
nL <- countLines("Map_1.hdr")
df <- read.csv("Map_1.hdr", header=FALSE, sep=" ", skip=nL-rem, stringsAsFactors = FALSE)

有了这个，我得到了最后两行，但我仍然很远才能清理其余的行。有没有其他方法可以得到我想要的？

【问题讨论】：

标签： r csv import

【解决方案1】：

这是我改用的：

 txt <-"   ENVI
    description = {ROI id #1}
    samples = 16
    lines   = 4
    bands   = 1025
    data type = 4
    interleave = bip
    wavelength = 
    pixel size = {9.38E-07, 7.5E-07}
    x-start and y-start = {0.027363358, -0.007902135}"
rem <- 2
nL <- length(readLines(textConnection(txt)))
df <- read.delim(text=gsub(patt = "^.+\\{|\\}", 
                                 # ^^^^^^     removes everything upto last '{' 
                                    #     ^^^ as well as the trailing '}' 
                                    #    ^    the `|` char is regex logical OR
                           repl = "",  # by replacing with length zero character
                             readLines(textConnection(txt))), # input text or file
                    header=FALSE, sep=",",  # left the comma in so it can be 'sep'
                    skip=nL-rem, stringsAsFactors = FALSE)
> df
           V1           V2
1 0.000000938  0.000000750
2 0.027363358 -0.007902135

您可以将 readLines(textConnection(txt)) 的实例替换为您的文件名并删除 text= 参数。（这对于构建可工作的、可测试的示例很有用。）

【讨论】：

太棒了！如果我复制并粘贴您的尝试，它会起作用。不幸的是，我无法（完全）理解我应该如何修改以获得通用代码...txt <-read.csv("Map_1.hdr") rem <- 2 nL <- length(readLines(textConnection(txt))) df <- read.delim(text=gsub("^.+\\{|\\}","", readLines(textConnection(txt))), header=FALSE, sep=",", skip=nL-rem, stringsAsFactors = FALSE)
我会放一些解释性的 cmets "inline"
如果它有效，即使你不能投票，你也可以勾选它。

【解决方案2】：

这行得通吗？不确定我是否完全理解您想要的输出：

>attempt <- read.table("~/"Map_1.hdr"",  sep= "=", stringsAsFactors = F)

> tail(attempt,2)$ENVI
[1] " {9.38E-07, 7.5E-07}"         " {0.027363358, -0.007902135}"
> tail(attempt,2)$ENVI[1]
[1] " {9.38E-07, 7.5E-07}"
> tail(attempt,2)$ENVI[2]
[1] " {0.027363358, -0.007902135}"

然后您可以使用strsplit 和gsub 从那里获取您需要的内容？

> strsplit(gsub('[\\{}]', "", tail(attempt,2)$ENVI[1]),",")[[1]][1]
[1] " 9.38E-07"
> strsplit(gsub('[\\{}]', "", tail(attempt,2)$ENVI[1]),",")[[1]][2]
[1] " 7.5E-07"

【讨论】：