perl 从文件中删除字符串块并保存到文件答案

【问题标题】：perl remove string block from file and save to fileperl 从文件中删除字符串块并保存到文件
【发布时间】：2019-12-10 02:29:14
【问题描述】：

我有一个如下所示的文件：

string 1 {
    abc { session 1 }
    fairPrice {
            ID LU0432618274456
            Source 4
            service xyz
    }
}
string 2 {
    abc { session 23 }
    fairPrice {
            ID LU036524565456171
            Source 4
            service tzu 
    }
}

我的程序应该使用给定的搜索参数（例如“字符串 1”）读取文件并搜索完整的块，直到“}”并从文件中删除该部分。有人可以帮忙吗...到目前为止我有一些代码，但是我怎样才能删除并再次保存到同一个文件中？

my $fh = IO::File->new( "$fname", "r" ) or die ( "ERROR: Strategy file      \"$fname\" not found." );
while($line=<$fh>)
{
    if ($line =~ /^\s*string 1\s*\w+\s*\{\s*$/) {
            $inside_json_msg = 1;
            $msg_json .= $line;
    }
    else {
            if ($inside_json_msg)
            {
               if ($line =~ m/^\}\s*$/) {

                 $msg_json.= $line if defined($line);
                 $inside_json_msg = 0;
               } else {
                 $msg_json .= $line;
               }
            }
    }
}

【问题讨论】：

我会推荐使用Regexp::Grammars
等等，你的代码提到了 JSON。数据实际上是 JSON 吗？如果是这样，您可以使用一些库。

标签： string perl block

【解决方案1】：

您的代码提到了 JSON，但您的数据不是 JSON。如果是 JSON 格式，而你刚刚转录的很糟糕，请使用a JSON library。

但是，如果您的数据不是 JSON，那么这样的事情就可以解决问题。

#!/usr/bin/perl

use strict;
use warnings;

my $match = shift or die "I need a string to match\n";

while (<DATA>) {
  # If this is the start of a block we want to remove...
  if (/^\s*$match\s+{/) {
    # Set $braces to 1 (or 0 if the block closes on this line)
    my $braces = /}/ ? 0 : 1;
    # While $braces is non-zero
    while ($braces) {
      # Read the next line of the file
      $_ = <DATA>;
      # Increment or decrement $braces as appropriate
      $braces-- if /}/;
      $braces++ if /{/;
    }
  } else {
    # Otherwise, just print the line
    print;
  }
}

__DATA__
string 1 {
    abc { session 1 }
    fairPrice {
            ID LU0432618274456
            Source 4
            service xyz
    }
}
string 2 {
    abc { session 23 }
    fairPrice {
            ID LU036524565456171
            Source 4
            service tzu 
    }
}

目前，这只是将输出打印到控制台。我使用DATA 文件句柄来简化测试。切换到使用真正的文件句柄留给读者作为练习:-)

更新：我决定我不喜欢使用正则表达式匹配的所有$braces 的递增和递减。所以这是另一个（改进的？）版本，它使用y/.../.../ 来计算行中左大括号和右大括号的出现次数。这个版本的可读性可能会稍差一些（语法高亮当然是这么认为的）。

#!/usr/bin/perl

use strict;
use warnings;

my $match = shift or die "I need a string to match\n";

while (<DATA>) {
  if (/^\s*$match\s+{/) {
    my $braces = y/{// - y/}//;
    while ($braces) {
      $_ = <DATA>;
      $braces -= y/}//;
      $braces += y/{//;
    }
  } else {
    print;
  }
}

__DATA__
string 1 {
    abc { session 1 }
    fairPrice {
            ID LU0432618274456
            Source 4
            service xyz
    }
}
string 2 {
    abc { session 23 }
    fairPrice {
            ID LU036524565456171
            Source 4
            service tzu 
    }
}

更新 2： 好的，我最初说处理真正的文件句柄将留给读者作为练习。但这里有一个版本可以做到这一点。

#!/usr/bin/perl

use strict;
use warnings;

my $match = shift or die "I need a string to match\n";

open my $fh, '+<', 'data' or die $!;

# Read all the data from the file
my @data = <$fh>;

# Empty the file
seek $fh, 0, 0;
truncate $fh, 0;

my $x = 0;
while ($x <= $#data) {
  $_ = $data[$x++];
  if (/^\s*$match\s+{/) {
    my $braces = y/{// - y/}//;
    while ($braces) {
      $_ = $data[$x++];
      $braces -= y/}//;
      $braces += y/{//;
    }
  } else {
    print $fh $_;
  }
}

目前，我已将文件名硬编码为 data。我希望如何解决这个问题很明显。

【讨论】：

嗨，戴夫，感谢您提供的提示，我更喜欢第一种方式......但是关于如何删除和打印到我正在读取的完全相同的文件的任何想法？谢谢和问候
实际上你的代码只是打印整个文件，但我想删除从字符串 1 到最后一个右括号的所有内容......所以它应该只打印从字符串 2 开始的所有内容......
实际上你的代码只是打印整个文件只有当你用一个不是文件中块名称的字符串调用它时才会这样做。你怎么称呼它？ 我想删除从字符串 1 到最后一个右大括号的所有内容如果你正确调用它，这正是它所做的。
@Unsal：关于如何删除和打印到我正在读取的完全相同的文件的任何想法？ 是的。查看我的最新更新。
@Unsal：你怎么称呼它？

【解决方案2】：

可以使用Text::Balanced 将文本分成由{} 分隔的块，同时保持块前后的文本。

在该列表中删除具有特定跳过模式的元素（此处为string 1）及其后续块并保留其他所有内容。然后用那个覆盖源文件。

use warnings;
use strict;
use Path::Tiny;
use Text::Balanced qw(extract_bracketed extract_multiple);

my $file = shift // die "Usage: $0 file\n";  #/
my $text = path($file)->slurp;

# returns: 'string 1', BLOCK, 'string 2', BLOCK (may have spaces/newlines)
my @elems = extract_multiple( 
    $text, [ sub { extract_bracketed($text, '{}') } ]
); 

my $skip_phrase = 'string 1';    
my (@text_keep, $skip);

for (@elems) {
    if (/$skip_phrase/) { 
        $skip = 1;
        next;
    }   
    elsif ($skip) {
        $skip = 0;
        next
    }

    push @text_keep, $_;
}

print for @text_keep;

# Overwrite source; uncomment when tested
#open my $fh_out, '>', $file or die "Can't open $file: $!";  
#print $fh_out $_ for @text_keep;

测试了包含更多文本和块的文件，包括要删除的文件之前和之后。

另一个可用于提取分隔块的工具位于Regexp::Common，请参阅this post。

【讨论】：

【解决方案3】：

我会使用正确的 json 作为格式，并使用 jq 作为该格式的处理器。用 perl 重写一个 hack 没有多大意义。

【讨论】：

【解决方案4】：

这里是一个使用Regexp::Grammars的例子：

use feature qw(say);
use strict;
use warnings;
use Data::Printer;
use Regexp::Grammars;
{
    my ($block_name, $block_num) = @ARGV;
    my $parser = qr!
        <nocontext:> 
        <blocks>
        <rule: blocks> <[block]>+ 
        <rule: block> <block_name> <block_num> <braced_item>
        <token: block_name> \w+
        <token: block_num> \d+
        <rule: braced_item>   \{  (?: <escape> | <braced_item> | [^{}] )*  \}
        <token: escape> \\ .
    !xms;

    my $data = read_file('cfg.txt');
    if ($data =~ $parser) {
        print_blocks( $/{blocks}{block}, $block_name, $block_num );
    }
    else {
        warn "No match";
    }
}

sub print_blocks {
    my ( $blocks, $block_name, $block_num ) = @_;

    for my $block (@$blocks) {
        next if ($block->{block_name} eq $block_name)
          && ($block->{block_num} == $block_num);
        say $block->{block_name}, " ", $block->{block_num},
          " ", $block->{braced_item}{braced_item};
    }
}

sub read_file {
    my ( $fn ) = @_;

    open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
    my $str = do { local $/; <$fh> };
    close $fh;
    return $str;
}

【讨论】：