一个函数，它标识一个字符串在 lisp 中被另一个字符串包含多少次答案

【问题标题】：A function which identifies how many times a string is included in another one in lisp一个函数，它标识一个字符串在 lisp 中被另一个字符串包含多少次
【发布时间】：2016-01-28 09:59:08
【问题描述】：

我阻止编写 lisp 函数来标记一个字符串被包含在另一个字符串中的次数

我尝试了这个向我发送错误的函数：

*** - +: "abc" 不是数字

(defun string-contain (string1 string2)
  (cond
   ((not (length string1)) nil) ; string1 est vide (pas besoin de le tester à chaque fois)
   ((> (length string1) (length string2)) nil) ; string1 est plus longue que chaine2
   ((string= string1 (subseq string2 0 (length string1))) string1) 
   (t (+ 1(string-include string1 (subseq string2 1))))))

谢谢

【问题讨论】：

标签： lisp common-lisp

【解决方案1】：

一般来说，当你在处理字符串时，你应该尽量避免调用subseq，因为它会创建一个新的字符串，而你不想做所有的字符串分配。 Common Lisp 中的许多序列处理函数都带有开始和结束参数，因此您可以指定要查找的序列的哪些部分。 search 函数在另一个序列中查找一个序列的出现并返回第一次出现的索引。您可以使用新的 :start2 值重复调用search，以便在字符串中搜索得越来越远。例如：

(defun search-all (needle haystack &key key (test 'eql)
                                     (start1 0)
                                     (end1 (length needle))
                                     (start2 0)
                                     (end2 nil)
                                     (overlaps nil))
  "Counts the number of times that NEEDLE appears in HAYSTACK. START1
and END1, and START2 and END2, are bounding index designators of
NEEDLE and HAYSTACK, respectively.  If OVERLAPS is true, then
overlapping occurrences will be counted separately."
  (do* ((len1 (- end1 start1))           ; length of needle (constant)
        (upd (if overlaps 1 len1))       ; how much to increment pos
        (occurrences 0 (1+ occurrences)) ; occurrences, increments by 1
        (start2 start2 (+ pos upd))      ; start2, updated to pos+upd
        (pos #1=(search needle haystack  ; pos. of needle, or NIL
                        :start1 start1 :end1 end1
                        :start2 start2 :end2 end2
                        :test test :key key)
             #1#)) 
       ((null pos) occurrences))) ; when pos is NIL, return occurrences

其中有一点可能有点令人困惑。 do 和 do* 循环中的变量绑定具有 (variable [init-form [update-form]]) 形式，我们想要pos 的 init-form 和 update-form 相同，即调用 search。在 Common Lisp 代码中，您可以使用 #n=form 之后再使用 #n# 来引用相同的表单。这就是为什么我使用 #1=(search …) 作为 init-form，然后使用 #1# 作为 更新形式。

这里有一些例子：

;; Find 'ab' within a 'abcdabcd'
(SEARCH-ALL "ab" "abcdabcd")
;;=> 2

;; Find 'cat' within a 'one cat two cat three cat'
(SEARCH-ALL "concatenate" "one cat two cat three cat" :START1 3 :END1 6)
;;=> 3

;; Find 'cat' within 'one cat two cat'
(SEARCH-ALL "concatenate" "one cat two cat three cat" :START1 3 :END1 6 :START2
            0 :END2 15)
;;=> 2

;; Fail to find 'cat' in 'Cat'
(SEARCH-ALL "cat" "Cat")
;;=> 0

;; Find 'cat' in 'Cat'
(SEARCH-ALL "cat" "Cat" :TEST 'CHAR-EQUAL)
;;=> 1

;; Find 2 'aaa' in 'baaaaaab' (no overlaps)
(SEARCH-ALL "aaa" "baaaaaab" :OVERLAPS NIL)
;;=> 2

;; Find 4 'aaa' in 'baaaaaab' (with overlaps)
(SEARCH-ALL "aaa" "baaaaaab" :OVERLAPS T)
;;=> 4

【讨论】：

要通过非常小的更改来改进答案，您可以添加另一个关键字参数，说明是否计算重叠事件（默认），这将控制 start2 的其他值是否为 @987654325 @ + 1，而不是pos + len1。
@acelent 你是对的。我更新了我的代码并添加了一些示例。

【解决方案2】：

查看代码，这看起来像是错误的来源：

((string= string1 (subseq string2 0 (length string1))) string1)

这一行将返回一个字符串，如果比较成功，应该返回“1加上检查string1是否在'string2的开头，前一个字符”的值。

您可能还想在默认情况下跳过(+ 1 ...)（不匹配）。在基本情况下，您肯定希望返回 0 而不是 nil。

【讨论】：

【解决方案3】：

(not (length string)) 将始终为 false 或表示类型错误。您可能想与 0 进行比较，zerop。

【讨论】：

【解决方案4】：

您的函数存在三个肉眼发现的问题：

正如 Svante 指出的那样，(not (length string1)) 将永远是 nil。
您的函数在两个分支中返回nil，在最后一个分支中返回一个数字。这种不一致可能会导致将来出现问题。
没有函数string-include。

这是我将如何解决这个问题。我们想要计算给定字符串包含在另一个字符串中的次数。这可以分为以下几种情况：

如果第一个字符串（“子字符串”）比第二个短，则答案必须为 0。
如果第一个字符串的长度等于第二个字符串的长度并且这些字符串相等，则答案必须为 1。
如果第一个字符串比第二个字符串短，但从一开始就构成了它的一部分，我们找到了 1 个包含，加上我们需要检查是否在其余部分包含相同的子字符串（尾) 的第二个字符串。
任何其他结果都必须为 0。

下面是实现它的代码：

(defun substring-times (substr string)
  (cond ((> (length substr) (length string)) 0)
        ((and (= (length substr) (length string))
              (string= substr string))
         1)
        ((string= substr (subseq string 0 (length substr)))
         (1+ (substring-times substr (subseq string (length substr)))))
        (t 0)))

我们可以测试一下

> (substring-times "ab" "abababababc")
5

此函数不涵盖“ab”包含在“cabxabyab”中的情况。但这种改变是微不足道的（正如他们喜欢在书中说的那样，留作练习）。

更有趣的是这种函数效率低下（它使用递归代替迭代）并且在 Common Lisp 中不习惯。使用迭代重写它会很好：

(defun substring-times (substr string)
  (let ((sublen (length substr))
        (len (length string))
        (result 0)
        (i 0))
    (loop
       while (<= i (- len sublen))
       if (string= substr string :start2 i :end2 (+ i sublen))
       do (progn
            (incf result)
            (incf i sublen))
       else
       do (incf i)
       end
       finally (return result))))

这个函数也能处理“cabxabyab”的情况：

> (substring-times "ab" "cabxabyab")
3

编辑：我已按照 Rainer Joswig 的建议将 subseq 替换为 string= 的关键字。

【讨论】：

不要使用 SUBSEQ。使用正确的关键字致电STRING=。
@RainerJoswig 我添加了an answer，它演示了索引指示符的关键字参数。