如何在 perl 中进行此文件操作？答案

【问题标题】：How do I do this file manipulation in perl?如何在 perl 中进行此文件操作？
【发布时间】：2011-12-15 16:54:30
【问题描述】：

所以我的文件看起来像这样：

--some comments--
--a couple of lines of header info--
 comp:
  name: some_name_A
  type: some_type
  id:   an id_1
  owner: who owns it
  path:  path_A to more data
 end_comp

 comp:
  name: some_name_B
  type: some_type
  id:   an id_2
  owner: who owns it
  path:  path_B to more data
 end_comp

我想做的事：从名称字段中获取名称，看看它是否与我们要搜索的名称之一匹配（已经在数组中提供），然后获取路径，去那条路，做一些perforce的东西并获得新的id，然后用新的id替换当前的id，只有当它与当前的id不同时。

我做了什么（只是一个伪）：

@filedata = <read_file> #read file in an array
$names_to_search = join("|", @some_names);

while(lines=@filedata)
{
 if( $line =~ /comp:/ )
 {
   $line = <next line>;
   if( $line =~ /name: $names_to_search/ )
   {
    #loop until we find the id
    #remember this index since we need to change this id

    #loop until we find the path field
    #get the path, go to that path, do some perforce commands and obtain new id
    if( id is same as current id ) no action required
    else replace current id with new id
   }
  }
}

问题：我当前的实现有三个 while 循环！有没有更好/高效/优雅的方式来做到这一点？

【问题讨论】：

文件是否可以包含两个具有相同name值的块？

标签： performance perl file-io

【解决方案1】：

您以自定义格式编写了一个配置文件，然后尝试手动解析它。相反，为什么不将文件写成 YAML 或 INI 等既定格式，然后使用现有模块进行解析呢？

例如，使用 YAML：

use YAML::Any;
my @data = YAML::Any::LoadFile($filename) or die "Could not read from $filename: $!":

# now you have your data structure in @data; parse it using while/for/map loops.

您可以使用Config::INI 或Config::INI::Simple 读取INI 文件。

【讨论】：

正如许多人建议的那样，我将以 xml 格式编写它，然后使用 Perl 的 xml 解析器。 :)

【解决方案2】：

这是一些伪代码：

index = 0;

index_of_id = 0; // this is the index of the line that contains the current company id

have_company = false; // track whether we are processing a copmany

while (line in @filedata)
{
  if (!have_company)
  {
    if (line is not "company") 
    {
      ++index;
      continue;
    }
    else
    {
      index_of_id = 0;
      have_company = true;
    }
  }
  else
  {
    if (line is "end_comp")
    {
      have_company = false; // force to start looking for new company
      ++index;
      continue;
    }

    if (line is "id")
      index_of_id = index;  // save the index

    if (line is "path")
    {
      // do your stuff then replace the string at the index given by index_of_id
    }
  }
  // line index
  ++index; 
}

// Now write the modified array to file

【讨论】：

【解决方案3】：

由于没有两个块可以具有相同的name 值，您可以使用哈希引用的哈希引用：

{
  "name1"=>{type=>"type1",id=>"id1",owner=>"owner1",path=>"path1"},
  "name2"=>{type=>"type2",id=>"id2",owner=>"owner2",path=>"path2"},
  #etc
}

这样的事情应该可以工作（警告：未经测试）：

use strict;
use warnings;

open(my $read,"<","input_file.txt") or die $!;

my $data={};
my $current_name=""; #Placeholder for the name that we're currently using.

while(<$read>)
{
  chomp; #get rid of trailing newline character.

  if(/^\s*name:\s*([\w]+)\s*$/) #If we hit a line specifying a name, 
                                #then this is the name we're working with
  {
    $current_name=$1;
  }
  elsif(/^\s*(type|id|owner|path):\s*([\w]+)\s*$/) #If it's data to go with the name, 
                                                   #then assign it.
  {
    $data->{$current_name}->{$1}=$2;
  }
}

close($read);

#Now you can search your given array for each of the names and do what you want from there.

但是，如果可以的话，我真的建议您以某种标准化格式（YAML、INI、JSON、XML 等）将数据存储在您的文件中，然后对其进行适当的解析。我还应该补充一点，此代码取决于出现在相应 type、id、owner 和 path 之前的每个 name。

【讨论】：