验证字符串是否包含在 Lisp 中的另一个字符串中的函数答案

【问题标题】：Function which verifies if a string is includes in an other one in Lisp验证字符串是否包含在 Lisp 中的另一个字符串中的函数
【发布时间】：2016-01-05 20:08:08
【问题描述】：

我正在尝试编写一个函数来验证一个字符串是否包含在 Lisp 中的另一个字符串中，但我不能

例如：

(string-include 'abd 'abbbe) => nil

(string-include 'ghf 'dghfd) => ghf

这是我的功能：

(defun string-include (string1 string2)
  (cond
    ((not string1) 0)
    ((not string2) 0)
    ((.... (string1) (string2)) (string1 (string-include string1 (cdr string2))))
    ((string-include  string1 (cdr string2)) ) )

【问题讨论】：

您的函数返回名称包含在 other 名称中的符号，否则返回 nil。如果第一个符号为零，你会怎么做。例如，如果你执行(string-include 'nil 'vanilla)，你会得到nil，但你不知道是因为nil在vanilla中还是因为它不在。

标签： string lisp common-lisp

【解决方案1】：

返回索引或子字符串，而不是符号

在您的问题中，您使用了以下示例：

(string-include 'abd 'abbbe) => nil
(string-include 'ghf 'dghfd) => ghf

假设您要返回 symbols nil 和 ghf，如果您想检查是否一个字符串包含子字符串 NIL。例如，通过这种方法，您将拥有：

(string-include 'nil 'vanilla) => nil

返回 nil 是不是因为 "NIL" 在 "VANILLA" 中，因为它不是？这是模棱两可的，你不能说。相反，您可以返回实际的字符串，因为 string "NIL" 是一个真值。更好的是，如果您返回字符串的 index，那么您会发现 where 在另一个字符串中第一个字符串出现。例如，这就是内置函数 search 的行为方式。

直接，使用 SEARCH

你可以用search来实现这个：

(defun substringp (needle haystack &key (test 'char=))
  "Returns the index of the first occurrence of the string designated
by NEEDLE within the string designated by HAYSTACK, or NIL if it does
not occur.  Characters within the string are compared by TEST, which
defaults to CHAR= (for case-sensitive comparison)."
  (search (string needle)
          (string haystack)
          :test test))

注意使用string 函数将string designators（字符、字符串和符号）转换为它们指定的字符串。请记住，在标准设置中，阅读器会将符号名称大写，因此符号 cat 表示字符串 "CAT"。最后，由于这会返回 search 的结果，它对您有双重作用：如果有一个出现，它会返回第一次出现的 index，而 nil 否则。请记住，除了 nil 之外的所有内容都是 true 值（甚至为 0），因此您可以将结果用作布尔值或索引（只要您检查它不是无）。以下是一些示例：

CL-USER> (substringp "cat" "concatenate")
3

CL-USER> (substringp "dog" "concatenate")
NIL

;; Default upcasing of symbol names means that the 
;; result of 'cat is a symbol named "CAT", which is not 
;; in "concatenate". 
CL-USER> (substringp 'cat "concatenate")
NIL

;; You can test the characters with CHAR-EQUAL, which
;; is case insensitive, in which case "CAT" is in 
;; "concatenate".
CL-USER> (substringp 'cat "concatenate" :test 'char-equal)
3

使用递归

您的代码以及 uselpa 在另一个答案中显示的代码本质上更具递归性。这本身不是问题，但 Common Lisp 中的递归字符串处理容易出现一些陷阱。使用 subseq 制作大量新字符串效率低下，因此 Common Lisp 中的许多序列函数采用 :start 和 :end 参数，或者对于采用两个序列的函数，:start1、:end1、:start2 和 :end2论据。通过使用这些，您可以递归并将 indices 更改为字符串，而不是创建全新的字符串。例如，string= 可让您比较两个字符串。

;; "toc" is in both "octocat" and "toccata"
CL-USER> (string= "octocat" "toccata" :start1 2 :end1 5 :end2 3)
T

使用这些类型的函数需要一点小心，以确保您不提供任何超出范围的索引，但这还不错，并且您最终不会复制任何字符串。下面是 substringp 的一个版本，它接受这些开始和结束参数，并使用本地递归函数进行实际处理。

(defun substringp (string1 string2
                   &key
                     (start1 0) (end1 nil)
                     (start2 0) (end2 nil))
  "Returns the index of the first occurence of the substring of
STRING1 bounded by START1 and END1 within the substring of STRING2
bounded by START2 and END2, or NIL if the string does not appear.  The
index is a position within STRING2 as a whole."
  ;; First, compute the actual strings designated by STRING1 and
  ;; STRING2, and the values for END1 and END2, which default to the
  ;; length of the respective strings.  Also get the length of the
  ;; substring in STRING1 that we're looking for. This is done just
  ;; once.  The actual recursive portion is handled by the local
  ;; function %SUBSTRINGP.
  (let* ((string1 (string string1))
         (string2 (string string2))
         (end1 (or end1 (length string1)))
         (end2 (or end2 (length string2)))
         (len1 (- end1 start1)))
    (labels ((%substringp (start2 &aux (end2-curr (+ start2 len1)))
               (cond
                 ;; If end2-curr is past end2, then we're done, and
                 ;; the string was not found.
                 ((not (< end2-curr end2)) nil)
                 ;; Otherwise, check whether the substrings match.  If
                 ;; they do, return the current start2, which is the
                 ;; index of the substring within string2.
                 ((string= string1 string2
                           :start1 start1 :end1 end1
                           :start2 start2 :end2 end2-curr)
                  start2)
                 ;; If that doesn't match, then recurse, starting one
                 ;; character farther into string2.
                 (t (%substringp (1+ start2))))))
      (%substringp start2))))

【讨论】：

那么我怎样才能准确地编写我的函数呢？
为什么在你的例子中 (substringp "cat" "concatenate") 返回 3 ？
@yoan15 正如我在回答中所解释的那样，如果子字符串出现在字符串中，则 toy 会取回索引（这始终是真实值）。所以 3 是 cat 在 concatenate 中的索引，这是真的，因为 cat 在 concatenate 中。
@yoan15 看起来您的字符串包含函数与我的 substringp 具有几乎相同的签名；我不确定你还需要什么其他部分。

【解决方案2】：

从您的代码来看，您正在寻找的是这样的：

(defun string-include (string1 string2)
  (cond
   ((zerop (length string1)) nil) ; string1 is empty (no need to test it every time)
   ((> (length string1) (length string2)) nil) ; string1 is longer than string2
   ((string= string1 (subseq string2 0 (length string1))) string1) ; string2 starts with string1
   (t (string-include string1 (subseq string2 1))))) ; otherwise shorten string2 by 1 and start over

这可行，但效率低下且不是惯用的 Common Lisp。只需确保您实际传递的是字符串而不是示例中的符号：

? (string-include "abd" "abbbe")
NIL
? (string-include "ghf" "dghfd")
"ghf"

当然，Joshua's answer 是推荐的解决方案。

编辑

添加了一个适用于符号和字符串的版本（但无论如何都会返回字符串）。我借此机会提出了约书亚的一项建议：

(defun string-include (string1 string2)
  (let* ((string1 (string string1)) (length1 (length string1)))
    (if (zerop length1)
        nil 
        (labels ((sub (s)
                   (cond
                    ((> length1 (length s)) nil)
                    ((string= string1 s :end2 (length string1)) string1)
                    (t (sub (subseq s 1))))))
          (sub (string string2))))))

测试：

? (string-include "abd" "abbbe")
NIL
? (string-include "ghf" "dghfd")
"ghf"
? (string-include 'abd  'abbbe) 
NIL
? (string-include 'ghf  'dghfd) 
"GHF"
? (string-include "ghf" '|dghfd|) 
"ghf"
? (string-include '|ghf|  "dghfd") 
"ghf"

【讨论】：

谢谢，但对于符号，这不是问题。有一个将符号转换为字符串的命令。
那么这是您正在寻找的解决方案吗？
许多字符串处理函数，包括string=，都精确地接受 start1、start2、end1 和 end2 关键字参数，这样您就不必使用 subseq进行大量复制>。在这种情况下，将(string= string1 (subseq string2 0 (length string1))) 替换为(string= string1 string2 :end2 (length string1)) 会很有帮助。将这些关键字参数添加到 string-include 也有助于避免最后一种情况下的 subseq。
另外，第一种情况不是倒退了吗？如果 string1 是空字符串，那么它是每个字符串的子字符串。例如，(string-include '|| 'foo) 应该产生 || 而不是 nil，对吧？（实际上，这也引发了关于包含 nil 的单词的问题。例如，(string-include 'nil 'foo) => nil 和 (string-include 'nil 'vanilla) => nil。
@uselpa 只是为了好玩，我在我的答案中添加了一个递归版本，它使用了 start 和 end 关键字参数。