【问题标题】:Remove string based on start and end pattern and remove newline in the process根据开始和结束模式删除字符串并在过程中删除换行符
【发布时间】:2022-01-21 14:22:46
【问题描述】:

我有一个文件,其中包含一些命令的输出 - 不幸的是,其中一些命令被控制台错误破坏:

path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745error: unable to retrieve file meta data from cluster.local:1095 [ status=down ]
nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="nonerror: unable to retrieve file meta data from cluster.local:1095 [ status=(down) ]
e"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/dataerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"

基本上,我希望完全删除以error: unable 开头并以] 字符结尾的字符串,而不是:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"

我会:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"

我尝试了以下方法:

sed -e 's/error:.*]$//g'

但是这给了我:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="n
one"

如何在删除坏字符串时也删除换行符?

谢谢

【问题讨论】:

    标签: regex bash awk sed


    【解决方案1】:

    使用gnu sed 你可以这样做:

    sed '/error: unable.*/ {s///;N;s/\n//;}' file
    

    或者使用awk:

    awk 'sub(/error: unable.*/, "") {s = $0; getline; print s $0}' file
    

    【讨论】:

      【解决方案2】:

      使用sed

      $ sed '/nerror:/{s/\(error_label=\)"nerror: unable[^]]*]/\1"none"/g;n;d}' input_file
      

      【讨论】:

        【解决方案3】:

        使用 GNU sed 用于 -E(启用 ERE)和 -z(一次读取整个文件,以便我们匹配正则表达式中的换行符):

        $ sed -Ez 's/error: unable[^]]+](\r?\n)?//g' file
        path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
        nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
        path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
        nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="none"
        path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
        nrep="00" fsid="196" host="cluster.local:1095" fstpath="/data/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"
        

        以上内容可让您在要匹配的文本末尾添加换行符,无论它们是 \ns 还是 \r\ns。

        【讨论】:

          【解决方案4】:

          使用您展示的示例,请尝试在此处使用 awk 的 RS 作为 null 来关注 awk。在这里用 GNU awk 编写和测试。

          awk -v RS="" '{gsub(/error: unable[^]]+]\n*/,"")} 1' Input_file
          

          解释: 简单的解释是,使用全局替换将 error: unable 替换为 ] 直到换行符(0 次或多次出现)为 NULL,然后执行打印.

          【讨论】:

            猜你喜欢
            • 2021-08-14
            • 2018-06-30
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2013-01-12
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多