【问题标题】:Function to generate the unique combinations of a list in Haskell在 Haskell 中生成列表的唯一组合的函数
【发布时间】:2023-04-09 09:15:01
【问题描述】:

是否有一个 Haskell 函数可以从列表中生成给定长度的所有唯一组合?

Source = [1,2,3]

uniqueCombos 2 Source = [[1,2],[1,3],[2,3]]

我尝试在 Hoogle 中查找,但找不到专门执行此操作的函数。排列没有给出想要的结果。

有人用过类似的功能吗?

【问题讨论】:

  • 原始列表可以有重复吗?

标签: haskell combinations combinatorics


【解决方案1】:

我也不知道预定义的函数,但是自己写很容易:

-- Every set contains a unique empty subset.
subsets 0 _ = [[]]

-- Empty sets don't have any (non-empty) subsets.
subsets _ [] = []

-- Otherwise we're dealing with non-empty subsets of a non-empty set.
-- If the first element of the set is x, we can get subsets of size n by either:
--   - getting subsets of size n-1 of the remaining set xs and adding x to each of them
--     (those are all subsets containing x), or
--   - getting subsets of size n of the remaining set xs
--     (those are all subsets not containing x)
subsets n (x : xs) = map (x :) (subsets (n - 1) xs) ++ subsets n xs

【讨论】:

  • 有趣的是,如果subsets 0 _ = [[]]subsets _ [] = [] 的出现顺序发生变化,则会中断。
【解决方案2】:

使用Data.List

import Data.List
combinations k ns = filter ((k==).length) $ subsequences ns

参考:99 Haskell Problems

参考中有很多有趣的解决方案,我只是选择了一个简洁的。

【讨论】:

  • Pointfree: combinations = (. subsequences) . filter . (. length) . (==)
【解决方案3】:

我不清楚您对性能的关注程度。

如果它有任何用处,早在 2014 年,有人发布了某种 performance contest 的各种 Haskell 组合生成算法。

对于 26 项中的 13 项的组合,执行时间从 3 秒到 167 秒不等! Bergi 提供了最快的条目。这是不明显的(至少对我而言)源代码:

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if (n > l) then []
                             else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                               in zipWith (++)
                                    ([]:next)
                                    ( map (map (x:)) next ++ [[]] )                 

最近,问题是revisited,在从一个大列表(100 个中的 5 个)中挑选几个元素的特定上下文中。在这种情况下,您不能使用 subsequences [1 .. 100] 之类的东西,因为它指的是长度为 2100 ≃ 1.26*1030 的列表。我提交了一个基于 algorithm 的状态机,它不像我希望的那样使用 Haskell 惯用语,但在这种情况下相当有效,每个输出项大约 30 个时钟周期。

旁注:使用 multisets 生成组合?

此外,还有一个Math.Combinatorics.Multiset 包可用。这是documentation。我只是简单地测试了它,但它可以用来生成组合。

例如,8 个元素中的 3 个元素的所有组合的集合就像 multiset 的“排列”,其中两个元素(存在和不存在)分别为 3 和 (8-3)=5 .

让我们用这个想法来生成 8 个元素中的 3 个元素的所有组合。有 (876)/(321) = 336/6 = 56 个。

*L M Mb T MS> import qualified Math.Combinatorics.Multiset as MS
*Math.Combinatorics.Multiset L M Mb T MS> pms = MS.permutations
*Math.Combinatorics.Multiset L M Mb T MS> :set prompt "λ> "
λ> 
λ> pms38 = pms $ MS.fromCounts [(True, 3), (False,5)]
λ> 
λ> length pms38
56
λ>
λ> take 3 pms38
[[True,True,True,False,False,False,False,False],[True,True,False,False,False,False,False,True],[True,True,False,False,False,False,True,False]]
λ> 
λ> str = "ABCDEFGH"
λ> combis38 = L.map fn pms38 where fn mask = L.map fst $ L.filter snd (zip str mask)
λ> 
λ> sort combis38
["ABC","ABD","ABE","ABF","ABG","ABH","ACD","ACE","ACF","ACG","ACH","ADE","ADF","ADG","ADH","AEF","AEG","AEH","AFG","AFH","AGH","BCD","BCE","BCF","BCG","BCH","BDE","BDF","BDG","BDH","BEF","BEG","BEH","BFG","BFH","BGH","CDE","CDF","CDG","CDH","CEF","CEG","CEH","CFG","CFH","CGH","DEF","DEG","DEH","DFG","DFH","DGH","EFG","EFH","EGH","FGH"]
λ>
λ> length combis38
56
λ>

至少在功能上,使用多重集合生成组合的想法是可行的。

【讨论】:

    【解决方案4】:

    lib中没有这样的操作,但是你可以自己轻松实现:

    import Data.List
    
    main = putStrLn $ show $ myOp 2 [1, 2, 3]
    
    myOp :: Int -> [a] -> [[a]]
    myOp 0 _ = []
    myOp 1 l = map (:[]) l
    myOp c l = concat $ map f $ tails l
        where
            f :: [a] -> [[a]]
            f []     = []
            f (x:xs) = map (x:) $ myOp (c - 1) xs
    

    【讨论】:

      【解决方案5】:

      @melpomene 的回答是通用且非常简洁的。这可能是您在 Internet 上许多需要 combinationsOf 函数的地方看到的。

      尽管隐藏在双重递归后面,但它会执行大量不必要的递归调用,这些调用是可以避免的,从而产生非常高效的代码。也就是说,如果列表的长度小于k,我们就不需要进行任何调用。

      我建议进行双重终止检查。

      combinationsOf :: Int -> [a] -> [[a]]
      combinationsOf k xs = runner n k xs
                            where
                            n = length xs
                            runner :: Int -> Int -> [a] -> [[a]]
                            runner n' k' xs'@(y:ys) = if k' < n'      -- k' < length of the list
                                                      then if k' == 1
                                                           then map pure xs'
                                                           else map (y:) (runner (n'-1) (k'-1) ys) ++ runner (n'-1) k' ys
                                                      else pure xs'   -- k' == length of the list.
      
      λ> length $ subsets 10 [0..19] -- taken from https://stackoverflow.com/a/52602906/4543207
      184756
      (1.32 secs, 615,926,240 bytes)
      
      λ> length $ combinationsOf 10 [0..19]
      184756
      (0.45 secs, 326,960,528 bytes)
      

      所以上面的代码,虽然尽可能的优化,但仍然是低效的,主要是由于内部的双重递归。根据经验,在任何算法中,最好避免双重递归,或者在非常仔细的检查下加以考虑。

      另一方面,以下算法在速度和内存消耗方面都是一种非常有效的方法。

      combinationsOf :: Int -> [a] -> [[a]]
      combinationsOf k as@(x:xs) | k == 1    = map pure as
                                 | k == l    = pure as
                                 | k >  l    = []
                                 | otherwise = run (l-1) (k-1) as $ combinationsOf (k-1) xs
                                   where
                                   l = length as
      
                                   run :: Int -> Int -> [a] -> [[a]] -> [[a]]
                                   run n k ys cs | n == k    = map (ys ++) cs
                                                 | otherwise = map (q:) cs ++ run (n-1) k qs (drop dc cs)
                                                 where
                                                 (q:qs) = take (n-k+1) ys
                                                 dc     = product [(n-k+1)..(n-1)] `div` product [1..(k-1)]
      
      λ> length $ combinationsOf 10 [0..19]
      184756
      (0.09 secs, 51,126,448 bytes)
      

      【讨论】:

      • 使用 GHCi 中的第二个代码块,combinationsOf 3 [1 .. 3] == [1, 2, 3, 2, 3, 3],这是不正确的,combinationsOf 4 [1 .. 3] 冻结了我的机器,我认为是由于内存耗尽导致的交换。
      • @Chai T. Rex 感谢您的提醒。我试图修复它。但是,这不是用于重复组合的代码,因此更新后的代码将不允许k &gt; l
      【解决方案6】:

      Monadic solution for unique combinations:

      cb _ 0 = [[]]
      cb xs n = (nxs >>= (\(nx, x) -> (x:) <$> (cb [z | (n,z) <- nxs, n>nx] (n-1)) )) where nxs = zip [1..] xs
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2022-12-21
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多