如何找到内存使用量较少的最长公共子字符串？答案

【问题标题】：How to find longest common substring with less memory usage?如何找到内存使用量较少的最长公共子字符串？
【发布时间】：2014-01-17 05:34:47
【问题描述】：

我需要找到最大（子串长度*计数）的最长公共子串。

例如，当我有字符串时：

hi, hello world ... hello world ... hi, hello world

答案是hello world，因为(11 * 3) > (15 * 2)。

我在这个question 中找到了相关讨论，但在我的情况下使用它并不实用，因为它的内存使用率很高。

有没有更好的方法来做到这一点？

【问题讨论】：

这些合适吗？ stackoverflow.com/questions/19158025/…
在Wikipedia article 中有一个使用动态编程的伪代码解决方案。
一点。他正在寻找最长的子串。我正在寻找占用字符串中大部分空间的子字符串（子字符串的最大长度 * 计数）
“使用它是不切实际的，因为它占用大量内存”。我不同意。您将在 trie 中插入每个字符串，其中的字符数小于每个字符串的字符总和。如果字符串太多无法放入内存，您可以划分问题
google "广义后缀树"

标签： algorithm

【解决方案1】：

这是 perl 中的一个解决方案。可能它的内存效率不高，但它适用于显示的测试字符串

use warnings;
use strict;

my $s="hi, hello world ... hello world ... hi, hello world";
my %h=();

#find the repeated strings, all of them
for (0..length($s) ) { 
    my $x=substr($s,$_); 
    for my  $m ($x=~/(.*).*\1/) { $h{$m}++} ; }

#find the count of each strings repeats
for my $f (keys %h) { $h{$f} = () = $s=~/\Q$f/g; }

#sort the length*count to find the best
my @ord=sort { length($b)*$h{$b} <=> length($a)*$h{$a} } keys %h;

#this one is the best
print $ord[0];

【讨论】：