【问题标题】:Remove multiple lines where string occurs and concatenate删除出现字符串并连接的多行
【发布时间】:2017-07-25 00:55:28
【问题描述】:

我是 Bash/Perl 的新手,并试图删除文本文件中出现字符串的多行。到目前为止,要删除一行,我有:

perl -ne '/somestring/ or print' /usr/file.txt > /usr/file1.tmp

要替换我使用的第二行:

perl -ne '/anotherstring/ or print' /usr/file.txt > /usr/file2.tmp

如何连接文件和 file2.tmp?

或者如何修改命令以删除出现somestringanotherstring 的多行?

【问题讨论】:

  • perl -ne '/somestring|anotherstring/ or print' /usr/file.txt > /usr/file2.tmpgrep -v 更适合这里。
  • egrep -v 如果您使用的是正则表达式。
  • 谢谢,但 grep -v 'somestring' /usr/file.txt 有效,但 grep -v 'somestring|anotherstring' /usr/file.txt 没有返回任何结果,这就是我使用 Perl 的原因。
  • 这是因为 grep 默认使用 POSIX BRE(基本正则表达式),您必须在其中转义 |,如下所示:grep 'somestring\|anotherstring' ...。另一种方法是使用扩展正则表达式 (ERE),启用 -E 标志,或(如 Chris 建议的那样)egrep。例如:grep -E 'somestring|anotherstring' ...
  • @randomir 我正在使用 Solaris。我在/usr/bin/egrep 中找到了 egrep,所以您的-E 解决方案现在可以工作了。感谢您的帮助。

标签: bash perl


【解决方案1】:

如何连接文件和 file2.tmp?

这可以用

cat file file2.tmp >> file3.tmp

或者如果file是指file1.tmp

cat file1.tmp file2.tmp >> file3.tmp

但是,这与您在问题的其余部分中描述的内容不同(即删除出现两种模式中的任何一种的任何行)。这可以通过链接你的命令来完成:

perl -ne '/somestring/ or print' /usr/file.txt > /usr/file1.tmp
perl -ne '/anotherstring/ or print' /usr/file1.tmp > /usr/file2.tmp

您可以使用管道来摆脱中间文件file1.tmp

perl -ne '/somestring/ or print' /usr/file.txt | perl -ne '/anotherstring/ or print' > /usr/file2.tmp

这也可以通过使用grep 来完成(假设您的字符串不使用任何 Perl 特定的正则表达式功能):

grep -v somestring /usr/file.txt | grep -v anotherstring > /usr/file2.tmp

最后,您可以将过滤合并为一个命令/正则表达式:

perl -ne '/somestring|anotherstring/ or print' /usr/file.txt > /usr/file2.tmp

或者使用grep:

grep -v 'somestring\|anotherstring' /usr/file.txt > /usr/file2.tmp

【讨论】:

    【解决方案2】:

    我对你的程序很感兴趣,并编写了一个高度动态的 Perl 程序 打印任何用户定义文件的每一行中单词的匹配或不匹配,然后将匹配或不匹配文件的请求行正确地打印到屏幕和新的用户定义的输出文件。

    我们将解析这个文件:iris_dataset.csv:

    "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
    5.1,3.5,1.4,0.2,"setosa"
    4.9,3,1.4,0.2,"setosa"
    4.8,3,1.4,0.3,"setosa"
    5.1,3.8,1.6,0.2,"setosa"
    4.6,3.2,1.4,0.2,"setosa"
    7,3.2,4.7,1.4,"versicolor"
    6.4,3.2,4.5,1.5,"versicolor"
    6.9,3.1,4.9,1.5,"versicolor"
    6.6,3,4.4,1.4,"versicolor"
    5.5,2.4,3.7,1,"versicolor"
    6.3,3.3,6,2.5,"virginica"
    5.8,2.7,5.1,1.9,"virginica"
    7.1,3,5.9,2.1,"virginica"
    6.3,2.9,5.6,1.8,"virginica"
    5.9,3,5.1,1.8,"virginica"
    

    这是一个逗号分隔值文件,其中的列用逗号分隔。 如果您在电子表格中查看此文件,则可以更好地查看每一列项目。我们将要查找的是文件的种类,因此可能要匹配的项目是“setosa”、“versicolor”和“virginica”。

    我的程序首先询问您要从中读取的文件.. 在这种情况下,它是 iris_dataset.csv,尽管它可以是任何文件。然后你写一个你想写的文件的名字。我称它为 new_iris.csv,但你可以称它为任何名称。

    然后我们告诉程序我们要查找多少个项目,所以如果有 3 个项目我可以输入:setosa、versicolor、virginica,顺序不限。如果有两个我只能输入两个项目,如果有一个,那么我只能在这个示例文件中输入 setosa 或 versicolor 或 virginica。

    然后我们被问到是否要保留与我们的项目匹配的行, 或者如果我们想删除与我们的文件匹配的文件行。如果我们保留匹配项,我们会将与这些项目匹配的行打印到屏幕和我们的输出文件中。如果我们选择删除,我们会得到与这些项目不匹配的行打印到屏幕和我们的文件中。如果我们既不选择 KEEP 也不选择 REMOVE,那么我们会收到一条错误消息,并且我们的新空 outfile 将被删除,因为它不包含任何内容。

    #!/usr/bin/env perl
    # Program: perl_matching.pl
    use strict; # Means that we have to explicitly declare our variables with "my", "our" or "local" as we want their scope defined. 
    use warnings; # We want to know if and if where errors are showing up in our program. 
    use feature 'say'; # Like print, but with automatic ending newline.
    use feature 'switch'; # Perl given:when switch statement. 
    no warnings 'experimental'; # Perl has something against switch. 
    
    ########### This block of code right here is basically equivalent to a unit ls command ##############
    opendir(DIR, "."); # Opens the current working directory 
    my @files = readdir(DIR); # Reads all files in the current working directory into an array @files. 
    closedir(DIR); # Now that we have the array of files, we can close our current working directory.
    say "Here are the list of files in your current working directory";
    foreach(@files){print "$_\t";} # $_ is the default variable for each item in an array.
    ########### It is not critical to run the program ####################  
    
    say "\nGive me your filename to read from, extensions and all ..."; # It would be a good idea to have your filename in yoru working directory.
    chomp(my $file_read = <STDIN>); # This makes the filename dynamic from user input. 
    say "Give me your filename to write to, extensions and all ...";
    chomp(my $file_write = <STDIN>); # results will be printed to this file, and standard output. # chomp removes newlines from standard input.
    
    # ' < ' to read from, and '>', to write to ... 
    # Opening your file to read from: 
    open(my $filehandle_read, '<', $file_read) or die "Problem reading file $_ because $!";
    # Open your file to write to. 
    open(my $filehandle_write, '>', $file_write) or die "Problem reading file $_ because $!";
    
    say "How many matches are you going to give me?";
    my $match_num = <STDIN>;
    say "Okay give me the matches now, pressing Enter key between each match.";
    
    my $i = 1; # This is our incrementer between matches. 
    my $matches; # This is each match presented line by line. 
    my @match_list; # This is our array (list) of $matches
    while($i <= $match_num)
    {
        $matches = <STDIN>; # One match at a time from standard input. 
        push @match_list, $matches; # Pushes all individual $matches into a list @match_list
        $i = $i + 1; # Increase the incrementor by one so this loop don't last forever. 
    }
    chomp(@match_list);
    
    undef($matches); # I am clearing each match, so that I can redefine this variable. 
    
    $matches = join('|', @match_list); # " | " is part of a regular expression which means "or" for each item in this scalar matches. 
    say "This is what your redefined matches variable looks like: $matches"; 
    
    say "Now you get a choice for your matches"; 
    say "KEEP or REMOVE?"; # if you type Keep (case insensitive) you print only the matches to the new file. If you type Remove (case insensitive) you print only the lines to the newfile which do not contain the matches.  
    chomp(my $choice = <STDIN>);
    
    my @lines_all = <$filehandle_read>; # The filehandle contains everything in the file, so we can pull all lines of the file to read into an array, where each item in the array is each line of the file opened for reading. 
    close $filehandle_read; # we can now close the filehandle for the file for reading since we just pulled all the information from it. 
    # We grep for the matching " =~ " lines of our file to read. 
    my @lines_matching = grep{$_ =~ m/$matches/} @lines_all;
    # We grep for the non-matching " !~ " lines of our file to read.
    # Note: $_ is a default variable for every item in the array.   
    my @lines_not_matching = grep{$_ !~ m/$matches/} @lines_all;
    
    
    # This is a Perl style switch statement.
    # Note: A given::when::when::default switch statement. 
    # is basically equivalent to ...
    # while::if::elsif::else statement. 
    
    # In this switch statement only one choice is performed,
    # which one depends on if you said "Keep" or "Remove" in your choice. 
    given($choice)
    {
        when($choice =~ m/Keep/i) # "i" is for case-insensitive, so Keep, KEEP, kEeP, etc are valid. 
        {
        say @lines_matching; # Print the matching lines to the screen. 
        print $filehandle_write @lines_matching; # Print the matching lines to the file. 
        close $filehandle_write; # Close the file now that we are done with it. 
        }
        when($choice =~ m/Remove/i) 
        {
        say @lines_not_matching; # Print the lines that match to the screen.
        print $filehandle_write @lines_not_matching; # Print the lines that do not match to the screen. 
        close $filehandle_write; # Close the file now that we are done with it.
        }
        default 
        {
        say "You must have selected a choice other than Keep or Remove. Don't do that!";
        close $filehandle_write; # Close the file now that we are done with it. 
        unlink($file_write) or warn "Could not unlink file $file_write"; # If you selected neither keep nor remove, we delete the new file to write to as it contains nothing.  
        }
    }
    

    下面是正在运行的脚本:

    我要求删除包含 versicolor 和 setosa 的行,因此只有包含 virginica 的行才会打印到屏幕和我称为 new_iris.csv 的输出文件中。再次,我要了 2 个项目。注意:在我的程序中,您可以以任何不区分大小写的方式键入单词 Keep 或 Remove。

      >perl perl_matching.pl
       Here are the list of files in your current working directory
    .       ..      iris_dataset.csv        perl_matching.pl
    Give me your filename to read from, extensions and all ...
    iris_dataset.csv
    Give me your filename to write to, extensions and all ...
    new_iris.csv
    How many matches are you going to give me?
    2
    Okay give me the matches now, pressing Enter key between each match.
    setosa
    versicolor
    This is what your redefined matches variable looks like: setosa|versicolor
    Now you get a choice for your matches
    KEEP or REMOVE?
    Remove
    "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
    6.3,3.3,6,2.5,"virginica"
    5.8,2.7,5.1,1.9,"virginica"
    7.1,3,5.9,2.1,"virginica"
    6.3,2.9,5.6,1.8,"virginica"
    5.9,3,5.1,1.8,"virginica"
    

    所以只有那些不包含 setosa 和 versicolor 的行会被打印到我们的文件中:new_iris.csv:

    "Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
    6.3,3.3,6,2.5,"virginica"
    5.8,2.7,5.1,1.9,"virginica"
    7.1,3,5.9,2.1,"virginica"
    6.3,2.9,5.6,1.8,"virginica"
    5.9,3,5.1,1.8,"virginica"
    

    我非常喜欢在 Perl 中使用标准输入。 您可以使用我的脚本仅打印文件中包含的行 塞托萨(您只要求 1 场比赛。)

    【讨论】:

    • 它是动态的”是没有意义的。 $matches 可以说是在其范围之外定义的。 undef $matches 风格不好。如果你只是要在下一行覆盖它,你为什么首先这样做?为什么要对两个完全不同的事物使用相同的变量?
    • 不要使用裸字文件句柄(或目录句柄,在这种情况下)。
    • given/when 抛出警告是有原因的。不要在新代码中使用它;不要只是让警告静音。
    • unlink 的错误消息不包括 $!
    • 我猜我的意思是你写了很多有问题的代码,这些代码只是与问题无关。
    猜你喜欢
    • 2019-08-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-01
    • 2017-12-03
    • 2020-11-22
    • 2020-06-16
    • 1970-01-01
    相关资源
    最近更新 更多