在 Clojure 中通过“窗口”谓词对序列进行分区答案

【问题标题】：Partition a seq by a "windowing" predicate in Clojure在 Clojure 中通过“窗口”谓词对序列进行分区
【发布时间】：2014-04-21 23:02:44
【问题描述】：

我想将一个序列“分块”成与 partition-by 相同的子序列，只是该函数不是应用于每个单独的元素，而是应用于一系列元素。

所以，例如：

(gather (fn [a b] (> (- b a) 2)) 
        [1 4 5 8 9 10 15 20 21])

会导致：

[[1] [4 5] [8 9 10] [15] [20 21]]

同样：

(defn f [a b] (> (- b a) 2))
(gather f [1 2 3 4]) ;; => [[1 2 3] [4]]
(gather f [1 2 3 4 5 6 7 8 9]) ;; => [[1 2 3] [4 5 6] [7 8 9]]

我的想法是我将列表的开头和下一个元素应用于函数，如果函数返回 true，我们将列表的当前头部分区到该点到一个新的分区中。

我写过这个：

(defn gather
  [pred? lst]
  (loop [acc [] cur [] l lst]
    (let [a (first cur)
          b (first l)
          nxt (conj cur b)
          rst (rest l)]
      (cond
       (empty? l) (conj acc cur)
       (empty? cur) (recur acc nxt rst)
       ((complement pred?) a b) (recur acc nxt rst)
       :else (recur (conj acc cur) [b] rst)))))

它有效，但我知道有一种更简单的方法。我的问题是：

是否有内置函数可以在不需要该函数的情况下执行此操作？如果没有，是否有我忽略的更惯用（或更简单）的解决方案？结合 reduce 和 take-while 的东西？

谢谢。

【问题讨论】：

这种问题，我一看到也想到了一个很长的解决方案，但我知道clojure专家会用大约2行代码解决它

标签： clojure

【解决方案1】：

问题原文解释

我们（所有人）似乎都误解了您的问题，因为只要谓词为连续元素保留，就想开始一个新分区。

另一个，懒惰的，建立在partition-by

(defn partition-between [pred? coll] 
  (let [switch (reductions not= true (map pred? coll (rest coll)))] 
    (map (partial map first) (partition-by second (map list coll switch)))))

(partition-between (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

实际问题

实际问题要求我们在pred? 为当前分区的开头 和当前元素成立时启动一个新分区。为此，我们只需对其来源进行一些调整即可撕掉partition-by。

(defn gather [pred? coll]
  (lazy-seq
   (when-let [s (seq coll)]
     (let [fst (first s)
           run (cons fst (take-while #((complement pred?) fst %) (next s)))]
       (cons run (gather pred? (seq (drop (count run) s))))))))

(gather (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4])
;=> ((1 2 3) (4))

(gather (fn [a b] (> (- b a) 2)) [1 2 3 4 5 6 7 8 9])
;=> ((1 2 3) (4 5 6) (7 8 9))

【讨论】：

我喜欢这个。所有的魔法都在(reductions not= true (map f coll (rest coll)))。昨晚我花了三十分钟试图记住如何做到这一点。
这很优雅。我注意到一个问题： (gather (fn [a b] (> (- b a) 2)) [1 2 3 4]) => ((1 2 3 4)) 而不是 ((1 2 3) (4))
我已经修改了原始问题以包括您的问题不起作用的情况。
@Scott 谢谢。看起来我和其他人都误解了你的问题！我们似乎都将其解释为只要连续元素相差超过 2 就创建一个新分区，而不是当元素与分区开始的相差超过 2 时。
是的，抱歉，应该在原始问题中包含更多预期的案例。

【解决方案2】：

由于您需要获取前一个或下一个元素的信息，而不是您当前决定的元素，因此在这种情况下，partition 与 reduce 的配对可以解决问题。

这是我经过几次迭代后得出的结论：

(defn gather [pred s]
  (->> (partition 2 1 (repeat nil) s) ; partition the sequence and if necessary
                                      ; fill the last partition with nils
    (reduce (fn [acc [x :as s]]
              (let [n   (dec (count acc))
                    acc (update-in acc [n] conj x)]
                (if (apply pred s)
                  (conj acc [])
                  acc)))
            [[]])))

(gather (fn [a b] (when (and a b) (> (- b a) 2)))
        [1 4 5 8 9 10 15 20 21])

;= [[1] [4 5] [8 9 10] [15] [20 21]]

基本思想是对谓词函数采用的元素数量进行分区，必要时用nils 填充最后一个分区。然后，该函数通过确定是否满足谓词来减少每个分区，如果满足，则将分区中的第一个元素添加到当前组并创建一个新组。由于最后一个分区可能已经填充了空值，因此必须修改谓词。

对这个功能的两个可能的改进是让用户：

定义填充最后一个分区的值，因此归约函数可以检查分区中的任何元素是否是该值。
指定谓词的数量，从而允许在考虑当前和下 n 个元素的情况下确定分组。

【讨论】：

【解决方案3】：

我写这个some time ago很有用：

(defn partition-between [split? coll]
  (lazy-seq
   (when-let [[x & more] (seq coll)]
     (lazy-loop [items [x], coll more]
       (if-let [[x & more] (seq coll)]
         (if (split? [(peek items) x])
           (cons items (lazy-recur [x] more))
           (lazy-recur (conj items x) more))
         [items])))))

它使用lazy-loop，这只是写lazy-seq表达式的一种方式，看起来像loop/recur，但我希望它相当清楚。

我链接到该函数的历史版本，因为后来我意识到有一个更通用的函数可以用来实现partition-between，或partition-by，或者实际上是许多其他顺序函数。这些天来实现是much shorter，但如果你不熟悉我称为glue 的更通用的函数，那么发生了什么就不那么明显了：

(defn partition-between [split? coll]
  (glue conj []
        (fn [v x]
          (not (split? [(peek v) x])))
        (constantly false)
        coll))

请注意，这两种解决方案都是惰性的，在我撰写本文时，此线程中的任何其他解决方案都不是这样。

【讨论】：

【解决方案4】：

这是一种方法，步骤是分开的。它可以缩小到更少的语句。

(def l [1 4 5 8 9 10 15 20 21])

(defn reduce_fn [f x y]
  (cond
   (f (last (last x)) y) (conj x [y])
   :else (conj (vec (butlast x)) (conj (last x) y)) )
  )

 (def reduce_fn1 (partial reduce_fn #(> (- %2 %1) 2)))

 (reduce reduce_fn1 [[(first l)]] (rest l))

【讨论】：

【解决方案5】：

keep-indexed 是一个很棒的功能。给定一个函数f和一个向量lst，

(keep-indexed (fn [idx it] (if (apply f it) idx))
       (partition 2 1 lst)))

(0 2 5 6)

这将返回要拆分的索引。让我们增加它们并在前面添加一个 0：

(cons 0 (map inc (.....)))

(0 1 3 6 7)

将这些分区以获得范围：

(partition 2 1 nil (....))

((0 1) (1 3) (3 6) (6 7) (7))

现在使用这些来生成subvecs：

(map (partial apply subvec lst) ....)

([1] [4 5] [8 9 10] [15] [20 21])

把它们放在一起：

(defn gather
  [f lst]
  (let [indices (cons 0 (map inc
                   (keep-indexed (fn [idx it]
                                    (if (apply f it) idx))
                       (partition 2 1 lst))))]
    (map (partial apply subvec (vec lst))
       (partition 2 1 nil indices))))

(gather #(> (- %2 %) 2) '(1 4 5 8 9 10 15 20 21))
([1] [4 5] [8 9 10] [15] [20 21])

【讨论】：

如果有更快的方法从索引到最终解决方案，请随时编辑。
非常感谢保持索引。这也是一个有用的功能。