字符串列表的最长公共前缀 (LCP)答案

【问题标题】：Longest common prefix (LCP) of a list of strings字符串列表的最长公共前缀 (LCP)
【发布时间】：2018-05-07 20:01:51
【问题描述】：

lcs([ H|L1],[ H|L2],[H|Lcs]) :-
    !,
    lcs(L1,L2,Lcs).
lcs([H1|L1],[H2|L2],Lcs):-
    lcs(    L1 ,[H2|L2],Lcs1),
    lcs([H1|L1],    L2 ,Lcs2),
    longest(Lcs1,Lcs2,Lcs),
    !.
lcs(_,_,[]).

longest(L1,L2,Longest) :-
    length(L1,Length1),
    length(L2,Length2),
    (  Length1 > Length2
    -> Longest = L1
    ;  Longest = L2
    ).

到目前为止，这是我的代码。我该如何优化它以便打印前缀，例如：

["interview", "interrupt", "integrate", "intermediate"]

应该返回"inte"

Prolog 有点生疏，好久没做 :)

【问题讨论】：

代码有问题吗？它提供解决方案吗？还是不正确的解决方案？让它“打印”当前没有的前缀是一种功能添加，而不是“优化”。

标签： list prolog prolog-dif prolog-cut logical-purity

【解决方案1】：

首先，让我们从一些相关的东西开始，但要简单得多。

:- set_prolog_flag(double_quotes, chars).  % "abc" = [a,b,c]

prefix_of(Prefix, List) :-
   append(Prefix, _, List).

commonprefix(Prefix, Lists) :-
   maplist(prefix_of(Prefix), Lists).

?- commonprefix(Prefix, ["interview", "integrate", "intermediate"]).
   Prefix = []
;  Prefix = "i"
;  Prefix = "in"
;  Prefix = "int"
;  Prefix = "inte"
;  false.

（见this的回答，如何打印带双引号的字符列表。）

这是在 Prolog 中相当容易的部分。唯一的缺点是它没有给我们最大值，而是所有可能的解决方案，包括最大值。请注意，不需要知道所有字符串，例如：

?- commonprefix(Prefix, ["interview", "integrate", Xs]).
   Prefix = []
;  Prefix = "i", Xs = [i|_A]
;  Prefix = "in", Xs = [i, n|_A]
;  Prefix = "int", Xs = [i, n, t|_A]
;  Prefix = "inte", Xs = [i, n, t, e|_A]
;  false.

所以我们得到了最后一个未知单词的部分描述作为响应。现在想象一下，后来我们意识到Xs = "induce"。 Prolog 没问题：

?- commonprefix(Prefix, ["interview", "integrate", Xs]), Xs = "induce".
   Prefix = [], Xs = "induce"
;  Prefix = "i", Xs = "induce"
;  Prefix = "in", Xs = "induce"
;  false.

事实上，我们在事后或在实际查询之前陈述这一点并没有什么区别：

?- Xs = "induce", commonprefix(Prefix, ["interview", "integrate", Xs]).
   Xs = "induce", Prefix = []
;  Xs = "induce", Prefix = "i"
;  Xs = "induce", Prefix = "in"
;  false.

我们现在可以根据这个制定最大值吗？请注意，这实际上需要某种形式的额外 quantor，我们在 Prolog 中没有任何直接规定。出于这个原因，我们必须将我们限制在我们知道是安全的某些情况下。最简单的方法是坚持单词列表不包含任何变量。为此，我将使用iwhen/2。

maxprefix(Prefix, Lists) :-
   iwhen(ground(Lists), maxprefix_g(Prefix, Lists)).

maxprefix_g(Prefix, Lists_g) :-
   setof(N-IPrefix, ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N-Prefix], Ns).   % the longest one

这种方法的缺点是如果不知道单词列表，我们会遇到实例化错误。

请注意，我们做了一些假设（我希望确实如此）。特别是我们假设只有一个最大值。在这种情况下，这成立，但一般来说，Prefix 可能有几个独立的值。此外，我们假设IPrefix 将始终接地。我们也可以检查一下，只是为了确定。或者：

maxprefix_g(Prefix, Lists_g) :-
   setof(N, IPrefix^ ( commonprefix(IPrefix, Lists_g), length(IPrefix, N ) ), Ns),
   append(_,[N], Ns),
   length(Prefix, N),
   commonprefix(Prefix, Lists_g).

在这里，前缀不必是一个单一的前缀（在我们的情况下就是这样）。

然而，最好的版本是更纯粹的版本，根本不需要诉诸实例化错误。

【讨论】：

【解决方案2】：

这是@CapelliC 提出（并随后撤回）的代码的纯化变体：

:- set_prolog_flag(double_quotes, chars).

:- use_module(library(reif)).

lists_lcp([], []).
lists_lcp([Es|Ess], Ls) :-
   if_((maplist_t(list_first_rest_t, [Es|Ess], [X|Xs], Ess0),
        maplist_t(=(X), Xs))
       , (Ls = [X|Ls0], lists_lcp(Ess0, Ls0))
       , Ls = []).

list_first_rest_t([], _, _, false).
list_first_rest_t([X|Xs], X, Xs, true).

meta-predicate 以上maplist_t/3 是maplist/2 的变体这适用于术语相等/不等式具体化—maplist_t/5 与更高的arity 相同：

maplist_t(P_2, Xs, T) :-
   i_maplist_t(Xs, P_2, T).

i_maplist_t([], _P_2, true).
i_maplist_t([X|Xs], P_2, T) :-
   if_(call(P_2, X), i_maplist_t(Xs, P_2, T), T = false).

maplist_t(P_4, Xs, Ys, Zs, T) :-
   i_maplist_t(Xs, Ys, Zs, P_4, T).

i_maplist_t([], [], [], _P_4, true).
i_maplist_t([X|Xs], [Y|Ys], [Z|Zs], P_4, T) :-
   if_(call(P_4, X, Y, Z), i_maplist_t(Xs, Ys, Zs, P_4, T), T = false).

首先是地面查询：

?-lists_lcp(["a","ab"], [])。 false. % 失败（如预期）

以下是@Fatalize's fine answer 中提出的查询。

?- lists_lcp(["interview",X,"intermediate"], "inte").
   X = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], dif(_A,r)
;  false.

?- lists_lcp(["interview","integrate",X], Z).
   X = Z, Z = []
;  X = Z, Z = [i]
;  X = Z, Z = [i,n]
;  X = Z, Z = [i,n,t]
;  X = Z, Z = [i,n,t,e]
;  X = [i,n,t,e,_A|_B], Z = [i,n,t,e]
;  X = [i,n,t,_A|_B]  , Z = [i,n,t]  , dif(_A,e)
;  X = [i,n,_A|_B]    , Z = [i,n]    , dif(_A,t)
;  X = [i,_A|_B]      , Z = [i]      , dif(_A,n)
;  X = [_A|_B]        , Z = []       , dif(_A,i).

?- lists_lcp([X,Y], "abc").
   X = [a,b,c]      , Y = [a,b,c|_A]
;  X = [a,b,c,_A|_B], Y = [a,b,c]
;  X = [a,b,c,_A|_B], Y = [a,b,c,_C|_D], dif(_A,_C)
;  false.

?- lists_lcp(L, "abc").
   L = [[a,b,c]]
;  L = [[a,b,c],[a,b,c|_A]]
;  L = [[a,b,c,_A|_B],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D]], dif(_A,_C)
;  L = [[a,b,c],[a,b,c|_A],[a,b,c|_B]]
;  L = [[a,b,c,_A|_B],[a,b,c],[a,b,c|_C]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c]]
;  L = [[a,b,c,_A|_B],[a,b,c,_C|_D],[a,b,c,_E|_F]], dif(_A,_E) 
…

最后，这是显示改进确定性的查询：

?- lists_lcp(["interview","integrate","intermediate"], Z).
Z = [i,n,t,e].                              % succeeds deterministically

【讨论】：

很好，但似乎我们可以更进一步！
@false。进一步？更多（和更好的）查询表明答案不“重叠”？
是的，确实如此。不同的做法！想象一下你有lcp/3。也就是两个列表的 lcp。现在...
我的推理有缺陷。它不仅适用于[X,Y]，还适用于更大的列表！
（而且...看来(',')/3 的扩展也有缺陷。急需一些元谓词问题...'elp！）

【解决方案3】：

这是我将如何实现的：

:- set_prolog_flag(double_quotes, chars).

longest_common_prefix([], []).
longest_common_prefix([H], H).
longest_common_prefix([H1,H2|T], P) :-
    maplist(append(P), L, [H1,H2|T]),
    (   one_empty_head(L)
    ;   maplist(head, L, Hs),
        not_all_equal(Hs)
    ).

one_empty_head([[]|_]).
one_empty_head([[_|_]|T]) :-
    one_empty_head(T).

head([H|_], H).

not_all_equal([E|Es]) :-
    some_dif(Es, E).

some_dif([X|Xs], E) :-
    if_(diffirst(X,E), true, some_dif(Xs,E)).

diffirst(X, Y, T) :-
    (   X == Y -> T = false
    ;   X \= Y -> T = true
    ;   T = true,  dif(X, Y)
    ;   T = false, X = Y
    ).

not_all_equal/1的实现来自this answer by @repeat（你可以在编辑历史中找到我的实现）。

我们使用append 和maplist 将列表中的字符串拆分为前缀和后缀，并且所有字符串的前缀都相同。为了使这个前缀最长，我们需要声明至少两个后缀的第一个字符不同。

这就是我们使用head/2、one_empty_head/1 和not_all_equal/1 的原因。 head/2 用于检索字符串的第一个字符； one_empty_head/1 用于声明如果后缀之一为空，则自动这是最长的前缀。 not_all_equal/1 用于然后检查或声明至少两个字符不同。

示例

?- longest_common_prefix(["interview", "integrate", "intermediate"], Z).
Z = [i, n, t, e] ;
false.

?- longest_common_prefix(["interview", X, "intermediate"], "inte").
X = [i, n, t, e] ;
X = [i, n, t, e, _156|_158],
dif(_156, r) ;
false.

?- longest_common_prefix(["interview", "integrate", X], Z).
X = Z, Z = [] ;
X = [_246|_248],
Z = [],
dif(_246, i) ;
X = Z, Z = [i] ;
X = [i, _260|_262],
Z = [i],
dif(_260, n) ;
X = Z, Z = [i, n] ;
X = [i, n, _272|_274],
Z = [i, n],
dif(_272, t) ;
X = Z, Z = [i, n, t] ;
X = [i, n, t, _284|_286],
Z = [i, n, t],
dif(_284, e) ;
X = Z, Z = [i, n, t, e] ;
X = [i, n, t, e, _216|_224],
Z = [i, n, t, e] ;
false.

?- longest_common_prefix([X,Y], "abc").
X = [a, b, c],
Y = [a, b, c|_60] ;
X = [a, b, c, _84|_86],
Y = [a, b, c] ;
X = [a, b, c, _218|_220],
Y = [a, b, c, _242|_244],
dif(_218, _242) ;
false.

?- longest_common_prefix(L, "abc").
L = [[a, b, c]] ;
L = [[a, b, c], [a, b, c|_88]] ;
L = [[a, b, c, _112|_114], [a, b, c]] ;
L = [[a, b, c, _248|_250], [a, b, c, _278|_280]],
dif(_248, _278) ;
L = [[a, b, c], [a, b, c|_76], [a, b, c|_100]] ;
L = [[a, b, c, _130|_132], [a, b, c], [a, b, c|_100]];
…

【讨论】：

longest_common_prefix([[A],[B],[b]], []), A=a,B=b. 给出了两个相同的解决方案？
@false 我的实现引入了多余的dif 约束，我不知道如何避免。
直到not_all_equal_/1，这是一种高度 Prologish 的方法！
@false 我正在写一个关于not_all_equal 实现的问题，因为它似乎是一个有用的谓词，但很难正确实现......
请注意，如上所述设置了 Prolog 标志，[i,n,t,e] = "inte"!所以他们是一样的。看来我的答案是如何写出如上所示的“inte”！

【解决方案4】：

This previous answer 提出了一个基于if_/3 的实现。

:- use_module(library(reif))。

这里有一些不同的看法：

lists_lcp([], []).
lists_lcp([Es|Ess], Xs) :-
   foldl(list_list_lcp, Ess, Es, Xs).                % foldl/4

list_list_lcp([], _, []).
list_list_lcp([X|Xs], Ys0, Zs0) :-
   if_(list_first_rest_t(Ys0, Y, Ys)                 % if_/3
      , ( Zs0 = [X|Zs], list_list_lcp(Xs, Ys, Zs) )
      ,   Zs0 = []
      ).

list_first_rest_t([], _, _, false).
list_first_rest_t([X|Xs], Y, Xs, T) :-
   =(X, Y, T).                                       % =/3

几乎我之前的回答中的所有查询都给出了相同的答案，所以我不在这里展示它们。

lists_lcp([X,Y], "abc") 但是，新代码不再普遍终止。

【讨论】：

由于某种原因，当我运行它崩溃的代码时，它说“stack.pl:6: Singleton variables: [Y]”@repeat
@blazing。您正在运行哪个查询？这个stack.pl-message 对我来说没有多大意义。请提供更多数据说明您遇到的问题。
代码无法编译，我尝试在终端的 swipl 上运行它，但它不仅给了我那个错误@repeat
@blazing。下载并安装library(reif)。我在答案中添加了一个链接。
@blazing。具体说明你得到的错误！ “那个错误”并不能帮助我定位问题所在。

【解决方案5】：

一个简单的版本：

:- set_prolog_flag(double_quotes, chars).
pref([],_,[]).
pref(_,[],[]).
pref([H|T1],[H|T2],[H|Tr]):-
    pref(T1,T2,Tr).
pref([H|_],[H|_],[]).
pref([H1|_],[H2|_],[]):-
    dif(H1,H2).

lcf([],[]).
lcf([W],R):-
    pref(W,W,R).
lcf([W1,W2|L],R):-
    pref(W1,W2,R),
    lcf([W2|L],R).

例子：

pref("interview","integrate",R).
R = [i, n, t, e] ;
R = [i, n, t] ;
R = [i, n] ;
R = [i] ;
R = [] ;
False.

lcf(["interview", "interrupt", "integrate", "intermediate"],R).
R = [i, n, t, e]

lcf(["interview", "interrupt", X, "intermediate"],R).
R = X, X = [i, n, t, e, r]

【讨论】：

【解决方案6】：

我最近不得不为两个列表实现这个，这是我想出的代码。它假定两个输入列表已充分实例化。

longest_common_prefix([X|Xs], [X|Ys], [X|Common]) :- !,
    longest_common_prefix(Xs, Ys, Common).
longest_common_prefix(_, _, []).

这很容易扩展到多个列表：

lcs([], []).
lcs([L1|Ls], Prefix) :-
    foldl(longest_common_prefix, Ls, L1, Prefix).

如果你不喜欢使用foldl：

lcs([], []).
lcs([L1|Ls], Prefix) :-
    lcs(Ls, L1, Prefix).

lcs([], Prefix, Prefix).
lcs([L1|Ls], Prefix0, Prefix) :-
    longest_common_prefix(L1, Prefix0, Prefix1),
    lcs(Ls, Prefix1, Prefix).

【讨论】：