【问题标题】:How to insert only new and/or updated lines into another file如何仅将新行和/或更新行插入另一个文件
【发布时间】:2012-03-23 14:14:10
【问题描述】:

与 Perl 打交道的第一天已经被阻止了 :)

情况是这样的:一个文件在文件夹 A 中更新,但也存在于文件夹 B、C 和 D 中,为了更容易,所有文件都可以不同,所以我不能只做一个差异。 打算复制到其他文件的新行由行尾的标志标识,例如 #I

更新前的文件如下所示:

    First line
    Second line
    Fifth line

更新后是这样的:

    First line
    Second line
    Third line #I
    Fourth line #I
    Fifth line
    Sixth line #I

我需要做的是在其他文件上搜索“第二行”,插入标有#I的行 - 按照它们插入的顺序 - 然后搜索“第五行”并插入“第六行” #我”。

在这个例子中,它们都是连续的,但是在我需要更新的文件中,第一个更新块和第二个(以及第三个等等)之间可以有几行。

要更新的文件可以是sh脚本、awk脚本、纯文本文件等,脚本应该是通用的。脚本会有两个入口参数,更新后的文件和要更新的文件。

欢迎提供有关如何执行此操作的任何提示。如果需要,我可以提供到目前为止的代码 - 关闭但尚未工作。

谢谢,

若昂

PS:这是我目前所拥有的

# Pass the content of the file $FileUpdate to the updateFile array
@updateFile = <UPD>;

# Pass the content of the file $FileOriginal to the originalFile array
@originalFile = <ORG>;

# Remove empty lines from the array contained on the updated file
@updateFile = grep(/\S/, @updateFile);

# Create an array that will contain the modifications and the line
# prior to the first modification.
@modifications = ();

# Counter initialization
$i = 0;


# Loop the array to find out which lines are flagged as new and
# which lines immediately precede those
foreach $linha (@updateFile) {

# Remove \n characters
chomp($linha);

# Find the new lines flagged with #I
if ($linha =~ m/#I$/) {

    # Verify that the previous line is not flagged as updated.
    # If it is not, it means that the update starts here.
    unless ($updateFile[$i-1] =~ m/#I$/) {
        print "Line where the update starts $updateFile[$i-1]\n";

        # Add that line to the array modifications
        push(@modifications, $updateFile[$i-1]);

    } # END OF unless 

print "$updateFile[$i]\n";

# Add the lines tagged for insertion into the array
push(@modifications, $updateFile[$i]);

} # END OF if ($linha =~ m/#I$/)

# Increment the counter
$i = $i + 1;

} # END OF foreach $linha (@updateFile) 


foreach $modif (@modifications) {
    unless ($modif =~ m/#I$/) {
        foreach $original (@originalFile) {
            chomp($original);
            if ($original ne $modif) {
                push (@newOriginal, $originalFile[$n]);
            }
            elsif ($original eq $modif) { #&& $modif[$n+1] =~ m/#I$/) {
                push (@newOriginal, $originalFile[$n]);
                last;
            }
            $n = $n + 1;
        }
    }
    if ($modif =~ m/#I$/) {
        push (@newOriginal, $modifications[$m]);
    }
    $m = $m + 1;
}

得到的结果几乎是我想要的,但还没有。

【问题讨论】:

  • 所以您正在从源 B/fileC/fileD/file 更新目标 A/file。源中的新行被标记,并且您必须将它们插入到目标中的行之后,该行与源中标记的新行之前的行相同。那正确吗?这不适合被删除的行可以吗?如果源中有多个相同的行,以至于您无法确定在哪里插入新记录,会发生什么情况?
  • 嗨 TLP,我已经添加了到目前为止的内容。
  • 嗨 Borodin,更新流程是相反的。 A/file 将更新 B/file、C/file 和 D/file。原则上不会有多个相同的行,但我还没有真正考虑过。也许插入第一个。

标签: perl file-comparison insertion-order


【解决方案1】:

我终于能够回到这个问题上,看来我已经能够解决这个问题了。可能不是最好的解决方案或“最漂亮”的解决方案,而是做我需要的解决方案:)。

# Open the file

# First parameter is the file containing the update
my ($FileUpdate) = $ARGV[0];

# Second parameter is the file to be updated
my ($FileOriginal) = $ARGV[1];


# \s whitespace characters

# Open both files and give them handles to be referred to further ahead
open(UPD, $FileUpdate) || die("Could not open file $FileUpdate!");
open(ORG, $FileOriginal) || die("Could not open file $FileOriginal!");

# ------------------------------------------------ #
# ---------------- ARRAY CREATION ---------------- #
# ------------------------------------------------ #

# Pass the content of the file $FileUpdate to the updateFile array
@updateFile = <UPD>;

# Pass the content of the file $FileOriginal to the originalFile array
@originalFile = <ORG>;

# Remove empty lines from the array contained on the updated file
@updateFile = grep(/\S/, @updateFile);

# Create an array that will contain the modifications and the line
# prior to the first modification.
@modifications = ();

# Counter initialization
$i = 0;


# ------------------------------------------------ #
# ----- LOOP TO IDENTIFY LINES FOR INSERTION ----- #
# ------------------------------------------------ #

# Loop the array to find out which lines are flagged as new and
# which lines immediately precede those
foreach $linha (@updateFile) {

# Remove \n characters
chomp($linha);

# Find the new lines flagged with #I
if ($linha =~ m/#I$/) {

    # Verify that the previous line is not flagged as updated.
    # If it is not, it means that the update starts here.
    unless ($updateFile[$i-1] =~ m/#I$/) {

        # Add that line to the array modifications
        push(@modifications, $updateFile[$i-1]);

    } # END OF unless 

# Add the lines tagged for insertion into the array
push(@modifications, $updateFile[$i]);

} # END OF if ($linha =~ m/#I$/)

# Increment the counter
$i = $i + 1;

} # END OF foreach $linha (@updateFile) 


# ------------------------------------------------ #
# --------- ADD VALUES TO MODIFICATIONS  --------- #
# ------------------------------------------------ #
foreach $valor (@modifications) {   
print "$valor\n";
}

# ------------------------------------------------ #
# -------------------- BACKUP -------------------- #
# ------------------------------------------------ #

# Make a backup copy from the original file   
# in case something goes wrong when updating it

# Obtain the current time
$tt=localtime();
use POSIX qw(strftime);
$tt = strftime "%Y%m%d-%H%M\n", localtime;

system("cp $FileOriginal $FileOriginal.$tt");

# ------------------------------------------------ #
# ------------- INSERT THE NEW LINES ------------- #
# ------------------------------------------------ #

# Counter initialization
$m = 0;

# New file array
@newOriginal = ();

# Goes through the original file and for each line not present in modifs, writes it .

foreach $original (@originalFile) {
# Initialize counter
$n = 0;

# Remove spaces
chomp ($original);

# Check if the value already exists on the array
# If it doesnt, adds it
if (grep {$_ eq $original} @newOriginal) {
}
else {
    push (@newOriginal, $originalFile[$m]); 
}

# Iterate over the array containing the modifications
# These new lines shall be added to the final file.
foreach $modif (@modifications) {
    # Remove spaces
    chomp ($modif);

    #print "Original: $original, Modif: $modif\n";

    # Initialize counter
    $k = 0;

    # Compare the current value from the original file with
    # the elements that exist on the modifications array.
    # If they are equal push that line in order to be added
    # to the results file.
    if ($original eq $modif) {

        # Increment the counter
        $k = $n+1;

        # Iterate the array with the modifications
        # in order to insert all lines that end with #I
        # immediately after the common line between files.
        foreach my $igual ($k..$#modifications) {

            # Remove spaces
            chomp($igual);

            # If the line ends with #I add it to the final file.
            if ($modifications[$igual] =~ m/#I$/) {

                foreach $newO (@newOriginal) {
                    # Remove spaces
                    chomp($newO);
                    if ($newO ne $modifications[$igual]) {
                        push (@newOriginal, $modifications[$igual]);
                        last;
                    }
                }
            }
            else {
                last;
            }
        }
    }

    # Increment the counter
    $n = $n + 1;
}
# Increment the counter
$m = $m + 1;
}

# ------------------------------------------------ #
# ------------- RESULTS PRESENTATION ------------- #
# ------------------------------------------------ #
$v = 0;
print "--------------------\n";
foreach $vl (@newOriginal) {
print "newOriginal: $newOriginal[$v]\n";
$v = $v + 1;
}
print "--------------------\n";

# ------------------------------------------------ #
# ------------- CREATE UPDATED FILE -------------- #
# ------------------------------------------------ #
$v = 0;

# Create the new name for the file - only for testing purposes now, it will be the original name afterwards
$NewFileToWriteTo = $FileOriginal;
# Retrieve the extension of the file to be updated
my ($ext) = $FileOriginal =~ /(\.[^.]+)$/;
# Remove the extension - just for testing purposes because I want to change the file name now
$NewFileToWriteTo =~ s/$ext//;
# Create the new file name by adding the suffix _tst and the correct extension to it.
$NewFileToWriteTo = $NewFileToWriteTo . '_tst' . ${ext};


# Create the new file or die in case it is not possible to open it
open DAT, ">$NewFileToWriteTo" or die("Could not open file!");


# Write to the new file. This will be the UPDATED version of the ORIGINAL file.
foreach $vl (@newOriginal) {
print DAT "$newOriginal[$v]\n";
$v = $v + 1;
}

# Close all files
close(DAT);
close(UPD);
close(ORG);

【讨论】:

    【解决方案2】:

    好的,我想我了解您的需求,下面的程序实现了解决方案。

    我并不完全清楚源(B、C、D)文件是什么样子,但我认为它们与更新后的目标(A)文件相同在你的问题中说明。

    我遇到的另一个极端情况:如果源(B、C、D)文件的第一行被标记为#I 怎么办?我假设它应该插入到输出的开头。

    如果在目标文件中找不到源文件中的前一行,我也选择了die

    让我们知道这是否正确。

    use strict;
    use warnings;
    
    open my $fa, '<', 'A.txt' or die $!;
    
    open my $fb, '<', 'B.txt' or die $!;
    
    my $keyline;
    my $inserting;
    
    while (<$fb>) {
    
      if (/#I$/) {
    
        if ($keyline) {             # We have to search for a match
    
          while () {
    
            my $source = <$fa>;     # read from the target
    
            if (defined $source) {  # copy to output. stop reading if key is found
              print $source;
              last if $source eq $keyline;
            }
            else {                  # die if key nowhere in target
              chomp $keyline;
              die qq(Key Line "$keyline" not found);
            }
          }
    
          undef $keyline;           # don't have to search next time
        }
    
        print;                      # insert the new line
      }
      else {
        $keyline = $_;              # remember the line to search for
      }
    }
    

    【讨论】:

    • 嗨鲍罗丁。感谢您的回复。我已经尝试过了,并将 A.txt 替换为 OriginalFile.txt 并将 B.txt 替换为 UpdatedFile.txt。当我运行它时,它会打印出原始文件的内容,而不会将在 UpdatedFile.txt 中插入的新行添加到输出中。 UpdatedFile.txt 将是所有其他文件的来源。关于第一行问题,从我看到的第一行不会改变,因为所有文件似乎都有一个以# -------- # 开头的标题。可能会发生,但到目前为止我还没有看到任何可能发生这种情况的地方。
    • @JoaoVilla-Lobos:请澄清哪个文件是哪个。在您的原始文件夹 A、B、C 和 D 中,其中包含带有标记为 #I 的行的文件,OriginalFile.txtUpdatedFile.txt 是指哪个? (我的代码期望使用来自B.txt 的插入来更新A.txt。)
    • 抱歉不清楚。尽管它们中的任何一个都可以包含 - 在给定时间 - 将用作源的文件和其他需要更新的文件,但可以说包含以 #I 结尾的行的文件位于文件夹 A .这个文件就是我命名为UpdatedFile.txt的文件。要更新的文件是 - 名字不好 - OriginalFile.txt。
    猜你喜欢
    • 2021-10-19
    • 1970-01-01
    • 1970-01-01
    • 2018-03-10
    • 1970-01-01
    • 2012-04-27
    • 2013-09-05
    • 2012-07-02
    • 2018-06-01
    相关资源
    最近更新 更多