从 Tcl 中的多个文件中读取特定数据答案

【问题标题】：Reading specific data from multiple files in Tcl从 Tcl 中的多个文件中读取特定数据
【发布时间】：2021-11-03 00:31:38
【问题描述】：

我正在编写一个 TCL 脚本来读取多个文件并使用正则表达式在它们中搜索包含特定单词的行。我已经能够从文件中搜索一件事。但是我需要修改脚本以在脚本中搜索多个内容，将在一个文件中找到的项目一起打印在一行中，然后在第二行中从另一个文件中找到的项目。我写了这个

foreach fileName [glob  /home/kartik/tclprac/*/*] {
#   puts " Directories present are: [file tail $fileName]" 
    set fp [open $fileName "r"]
    while { [gets $fp data]>=0 } {
    if {[regexp {set Date*} $data] | [regexp {set Channel* } $data] } {
    #puts "file: [file dirname $fileName] data: $data"
    set information "file: [file dirname $fileName] data: $data"
    puts $information
    set fp2 [open output.txt "a"]
    puts $fp2 $information

}
}
}

现在我得到的输出为：

file: /home/kartik/tclprac/wire_3 data: set Date 02/08/2021 
file: /home/kartik/tclprac/wire_2 data: set Date 01/08/2021
file: /home/kartik/tclprac/wire_1 data: set Channel Disney 
file: /home/kartik/tclprac/wire_1 data: set Date 31/07/2021

我想要的是类似的东西

file: /home/kartik/tclprac/wire_3 data: set Date 02/08/2021 
file: /home/kartik/tclprac/wire_2 data: set Date 01/08/2021 
file: /home/kartik/tclprac/wire_1 data: set Date 31/07/2021 set Channel Disney

【问题讨论】：

请注意，| 是数学上的“或”。使用|| 表示布尔“或”。
关键是不要立即打印信息：建立一个匹配行的列表，然后在阅读完文件后打印出加入的列表。
另外，关闭你打开的文件句柄，否则你可能会用完它们。
另外，你只需要在foreach循环开始之前打开一次output.txt

标签： regex file-io scripting tcl

【解决方案1】：

在我看来，您希望将单个文件的结果收集到一行，而不是为每个匹配行打印一行（传统的 grep 工具方法），每行结果被分开按空格。

我们可以这样做，但是如果我们将代码分成几个过程（一个用于处理单个文件的内容，另一个用于整个作业），它会变得更清晰。

proc processFileContents {name contents accumulatorChannel} {
    set interesting [lmap line [split $contents "\n"] {
        if {![regexp {set (?:Date|Channel) } $line]} {
            # SKIP non-matching lines
            continue
        }
        # Trim the matching lines
        string trim $line
    }]
    # If we matched anything, print out
    if {[llength $interesting]} {
        set information "file: $name data: [join $interesting \n]"
        puts $information 
        puts $accumulatorChannel $information
    }
}

proc processFilesInDir {pattern accumulatorChannel} {
    foreach fileName [glob -nocomplain -type f $pattern] {
        set channel [open $fileName]
        set contents [read $channel]
        close $channel

        processFileContents $fileName $contents $accumulatorChannel
    }
}

set accum [open output.txt "a"]
processFilesInDir /home/kartik/tclprac/*/* $accum
close $accum

如果您使用的是没有lmap（8.5 或更早版本）的旧版本 Tcl，那么您可以使用foreach 编写它（因为lmap 实际上只是foreach 的收集形式; 唯一的区别是lmap 使用隐藏的临时变量来进行累加）：

proc processFileContents {name contents accumulatorChannel} {
    set interesting {}
    foreach line [split $contents "\n"] {
        if {![regexp {set (?:Date|Channel) } $line]} {
            # SKIP non-matching lines
            continue
        }
        # Trim the matching lines
        lappend interesting [string trim $line]
    }
    # If we matched anything, print out
    if {[llength $interesting]} {
        set information "file: $name data: [join $interesting \n]"
        puts $information 
        puts $accumulatorChannel $information
    }
}

【讨论】：

在你问之前，不，你不能从 Tcl 代码中访问 lmap 的隐藏临时。它只存在于 C 或字节码空间中（取决于 Tcl 代码的编译方式，这不是重要的事情）并且实际上没有您的 Tcl 代码可以看到的名称。
嗨。你能解释一下这部分吗if {![regexp {set (?:Date|Channel) } $line]} { # SKIP non-matching lines continue } # Trim the matching lines string trim $line