模仿 C# 的 List<List<int>> 的 C 数据结构？答案

【问题标题】：C data structure to mimic C#'s List<List<int>>?模仿 C# 的 List<List<int>> 的 C 数据结构？
【发布时间】：2010-09-25 12:17:23
【问题描述】：

我希望将 c# 方法重构为 c 函数以尝试获得一些速度，然后在 c# 中调用 c dll 以允许我的程序使用该功能。

目前，c# 方法采用整数列表并返回整数列表的列表。该方法计算了整数的幂集，因此输入 3 个整数将产生以下输出（在此阶段整数的值并不重要，因为它用作内部加权值）

1
2
3
1,2
1,3
2,3
1,2,3

每行代表一个整数列表。输出指示第一个列表的索引（偏移量为 1），而不是值。所以 1,2 表示索引为 0 和 1 的元素是幂集的元素。

我不熟悉 c，那么对于允许 c# 访问返回数据的数据结构，我最好的选择是什么？

提前致谢

更新

到目前为止，感谢大家的 cmets。以下是问题性质的一些背景知识。

计算集合的幂集的迭代方法相当简单。两个循环和一点位操作就是它真正的全部。它只是被调用......很多（如果集合的大小足够大，实际上是数十亿次）。

我对使用 c（正如人们指出的 c++）的想法是，它为性能调整提供了更多空间。直接端口可能不会提供任何增加，但它为更多涉及的方法开辟了道路，从而获得更快的速度。即使每次迭代的小幅增加也将等同于可测量的增加。

我的想法是移植一个直接版本，然后努力增加它。然后随着时间的推移对其进行重构（在 SO 的每个人的帮助下）。

更新 2

jalf 的另一个公平点，我不必使用列表或等效项。如果有更好的方法，那么我愿意接受建议。列表的唯一原因是每组结果的大小不同。

到目前为止的代码...

public List<List<int>> powerset(List<int> currentGroupList)
{
    _currentGroupList = currentGroupList;
    int max;
    int count;

    //Count the objects in the group
    count = _currentGroupList.Count;
    max = (int)Math.Pow(2, count);

    //outer loop
    for (int i = 0; i < max; i++)
    {
        _currentSet = new List<int>();

        //inner loop
        for (int j = 0; j < count; j++)
        {              
            if ((i & (1 << j)) == 0)
            {
                _currentSetList.Add(_currentGroupList.ElementAt(j));                          
            }
        }
        outputList.Add(_currentSetList);
    }   
    return outputList;
}

如您所见，这并不多。它只是转来转去！

我承认创建和构建列表可能不是最有效的方式，但我需要某种方式以可管理的方式返回结果。

更新 2

感谢所有输入和实施工作。只是为了澄清提出的几点：我不需要输出处于“自然顺序”，而且我对返回的空集也不感兴趣。

hughdbrown 的实现很有趣，但我认为我需要在某个时候存储结果（或至少其中的一个子集）。听起来内存限制会在运行时间成为真正问题之前很久就应用。部分正因为如此，我认为我可以摆脱使用字节而不是整数，从而提供更多潜在存储空间。

真正的问题是：我们是否达到了 C# 中这种计算的最大速度？非托管代码选项是否提供更多范围。我知道在很多方面答案都是徒劳的，因为即使我们缩短了运行时间，它也只会允许原始集合中的额外值。

【问题讨论】：

int** 是一个 int 列表的列表。
不，它只是指向一个指向 int 的指针。如此多的简化将导致我们亲爱的 jxh00u 花费数小时的调试痛苦
这取决于你用它做什么:)
互操作将扼杀任何性能提升。发布您的 C# 并寻求更好的算法，或不安全的指针操作技巧。
在调用时互操作速度会变慢吗？

标签： c# c data-structures refactoring

【解决方案1】：

另外，请确保迁移到 C/C++ 确实是您开始时需要做的事情。检测原始 C# 方法（独立，通过单元测试执行），检测新的 C/C++ 方法（同样，通过单元测试独立），看看现实世界的区别是什么。

我提出这个问题的原因是，我担心这可能是一场不折不扣的胜利——使用 Smokey Bacon 的建议，你得到了你的列表类，你在“更快”的 C++ 中，但调用那个 DLL 仍然有成本: 使用 P/Invoke 或 COM 互操作退出运行时会带来相当大的性能成本。

在你做之前，确保你从那个跳跃中得到了你的“金钱价值”。

根据 OP 的更新进行更新

如果你重复调用这个循环，你需要绝对确保整个循环逻辑被封装在一个互操作调用中——否则编组的开销（正如其他人在这里提到的那样）肯定会杀死你。

我确实认为，鉴于问题的描述，问题不在于 C#/.NET 比 C“慢”，而更可能是代码需要优化。正如此处提到的另一张海报，您可以在 C# 中使用指针来显着提高这种循环中的性能，而无需编组。对于这种情况，在进入复杂的互操作世界之前，我会先研究一下。

【讨论】：

【解决方案2】：

如果您希望使用 C 来提高性能，您很可能打算通过使用指针来实现。 C# 允许使用指针，使用 unsafe 关键字。你考虑过吗？

另外，您将如何调用此代码.. 是否会经常调用它（例如在循环中？）如果是这样，来回编组数据可能会抵消任何性能提升。

跟进

查看Native code without sacrificing .NET performance 了解一些互操作选项。有一些方法可以在不损失太多性能的情况下进行互操作，但这些互操作只能发生在最简单的数据类型中。

尽管我仍然认为您应该研究使用直接 .NET 加速您的代码。

跟进 2

另外，如果您一心想要混合原生代码和托管代码，我可以建议您使用 c++/cli 创建您的库。下面是一个简单的例子。请注意，我不是 c++/cli 人，这段代码并没有做任何有用的事情......它只是为了展示您可以轻松地将本机代码和托管代码混合在一起。

#include "stdafx.h"

using namespace System;

System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList);


int main(array<System::String ^> ^args)
{
    System::Collections::Generic::List<int> ^intList = gcnew System::Collections::Generic::List<int>();

    intList->Add(1);
    intList->Add(2);
    intList->Add(3);
    intList->Add(4);
    intList->Add(5);

    Console::WriteLine("Before Call");
    for each(int i in intList)
    {
        Console::WriteLine(i);
    }

    System::Collections::Generic::List<int> ^modifiedList = MyAlgorithm(intList);

    Console::WriteLine("After Call");
    for each(int i in modifiedList)
    {
        Console::WriteLine(i);
    }
}


System::Collections::Generic::List<int> ^MyAlgorithm(System::Collections::Generic::List<int> ^sourceList)
{
    int* nativeInts = new int[sourceList->Count];

    int nativeIntArraySize = sourceList->Count;

    //Managed to Native
    for(int i=0; i<sourceList->Count; i++)
    {
        nativeInts[i] = sourceList[i];
    }

    //Do Something to native ints
    for(int i=0; i<nativeIntArraySize; i++)
    {
        nativeInts[i]++;
    }


    //Native to Managed
    System::Collections::Generic::List<int> ^returnList = gcnew System::Collections::Generic::List<int>();
    for(int i=0; i<nativeIntArraySize; i++)
    {
        returnList->Add(nativeInts[i]);
    }


    return returnList;
}

【讨论】：

【解决方案3】：

是什么让您认为调用 C 代码可以提高速度？ C 并不比 C# 神奇地快。当然，它可以是，但它也很容易变慢（和错误）。尤其是当您将 p/invoke 调用考虑到本机代码中时，还不能确定这种方法是否会加速任何事情。

无论如何，C 没有 List 之类的东西。它有原始数组和指针（你可以争辩说 int** 或多或少是等价的），但你最好使用 C++，它确实有等价的数据结构。特别是 std::vector。然而，没有简单的方法可以将这些数据暴露给 C#，因为它会非常随机地分散（每个列表都是指向一些动态分配的内存的指针somewhere）

但是，我怀疑最大的性能改进来自改进 C# 中的算法。

编辑：

我可以看到您的算法中有几处似乎不是最理想的。构建列表列表不是免费的。也许您可以创建一个列表并使用不同的偏移量来表示每个子列表。或者使用 'yield return' 和 IEnumerable 而不是显式构造列表可能会更快。

您是否分析过您的代码，找出时间花在哪里？

【讨论】：

【解决方案4】：

这一次返回一组幂集。它基于python代码here。它适用于超过 32 个元素的 powerset。如果您需要少于 32 个，您可以将 long 更改为 int。它非常快——比我以前的算法快，也比 P Daddy 的代码（我修改为使用 yield return 版本）快。

static class PowerSet4<T>
{
    static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
    {
        int count = currentGroupList.Length;
        Dictionary<long, T> powerToIndex = new Dictionary<long, T>();
        long mask = 1L;
        for (int i = 0; i < count; i++)
        {
            powerToIndex[mask] = currentGroupList[i];
            mask <<= 1;
        }

        Dictionary<long, T> result = new Dictionary<long, T>();
        yield return result.Values.ToArray();

        long max = 1L << count;
        for (long i = 1L; i < max; i++)
        {
            long key = i & -i;
            if (result.ContainsKey(key))
                result.Remove(key);
            else
                result[key] = powerToIndex[key];
            yield return result.Values.ToArray();
        }
    }
}

您可以下载我测试过的所有最快版本here。

我真的认为使用收益回报是使计算大型幂集成为可能的变化。预先分配大量内存会显着增加运行时间，并导致算法很早就因内存不足而失败。原始海报应该计算出他一次需要多少组 powerset。拥有所有这些元素并不是真正的选择，超过 24 个元素。

【讨论】：

我认为你是正确的，它是产生差异的产量。我对您提供尽可能快的实施的毅力以及提出的几个非常有效的观点印象深刻。也向 P Daddy 以及所有提供意见的人致敬。
P Daddy 有一个很酷的想法，我还没有提出来——分配正确长度的固定大小数组。相比之下，调整列表的大小是昂贵的。我的其他代码使列表具有最大长度，以免重新分配。
如果我这样做，我会编写一个自定义的 IList 类，该类环绕一个大小固定为 currentGroupList.Length 的数组。您可以轻松获得比此处基于字典的实现更好的性能。

【解决方案5】：

我还将投票支持调整您的 C#，尤其是通过使用“不安全”代码并减少可能会产生大量边界检查开销的方式。

尽管它“不安全”，但它的“安全性”丝毫不亚于 C/C++，而且更容易做到正确。

【讨论】：

【解决方案6】：

下面是一个 C# 算法，它应该比您发布的算法快得多（并且使用更少的内存）。它不使用您使用的整洁的二进制技巧，因此代码要长一些。它比你的有更多的for 循环，并且可能需要一两次使用调试器逐步完成它才能完全理解它。但它实际上是一种更简单的方法，只要您了解它在做什么。

作为奖励，返回的集合具有更“自然”的顺序。它将按照您在问题中列出的顺序返回集合 {1 2 3} 的子集。这不是重点，而是所用算法的副作用。

在我的测试中，我发现这个算法比你为一大组 22 个项目发布的算法快大约 4 倍（这是我可以在我的机器上运行的最大容量，而不会出现过多的磁盘抖动也不会影响结果很多）。你的一次运行大约需要 15.5 秒，而我的大约需要 3.6 秒。

对于较小的列表，差异不太明显。对于一组只有 10 个项目，您的运行 10,000 次大约需要 7.8 秒，而我的运行大约需要 3.2 秒。对于包含 5 个或更少项目的集合，它们几乎同时运行。通过多次迭代，您的运行速度会更快一些。

无论如何，这是代码。抱歉这么久；我试着确保我评论得很好。

/* 
 * Made it static, because it shouldn't really use or modify state data.
 * Making it static also saves a tiny bit of call time, because it doesn't
 * have to receive an extra "this" pointer.  Also, accessing a local
 * parameter is a tiny bit faster than accessing a class member, because
 * dereferencing the "this" pointer is not free.
 * 
 * Made it generic so that the same code can handle sets of any type.
 */
static IList<IList<T>> PowerSet<T>(IList<T> set){
    if(set == null)
        throw new ArgumentNullException("set");

    /*
     * Caveat:
     * If set.Count > 30, this function pukes all over itself without so
     * much as wiping up afterwards.  Even for 30 elements, though, the
     * result set is about 68 GB (if "set" is comprised of ints).  24 or
     * 25 elements is a practical limit for current hardware.
     */
    int   setSize     = set.Count;
    int   subsetCount = 1 << setSize; // MUCH faster than (int)Math.Pow(2, setSize)
    T[][] rtn         = new T[subsetCount][];
    /* 
     * We don't really need dynamic list allocation.  We can calculate
     * in advance the number of subsets ("subsetCount" above), and
     * the size of each subset (0 through setSize).  The performance
     * of List<> is pretty horrible when the initial size is not
     * guessed well.
     */

    int subsetIndex = 0;
    for(int subsetSize = 0; subsetSize <= setSize; subsetSize++){
        /*
         * The "indices" array below is part of how we implement the
         * "natural" ordering of the subsets.  For a subset of size 3,
         * for example, we initialize the indices array with {0, 1, 2};
         * Later, we'll increment each index until we reach setSize,
         * then carry over to the next index.  So, assuming a set size
         * of 5, the second iteration will have indices {0, 1, 3}, the
         * third will have {0, 1, 4}, and the fifth will involve a carry,
         * so we'll have {0, 2, 3}.
         */
        int[] indices = new int[subsetSize];
        for(int i = 1; i < subsetSize; i++)
            indices[i] = i;

        /*
         * Now we'll iterate over all the subsets we need to make for the
         * current subset size.  The number of subsets of a given size
         * is easily determined with combination (nCr).  In other words,
         * if I have 5 items in my set and I want all subsets of size 3,
         * I need 5-pick-3, or 5C3 = 5! / 3!(5 - 3)! = 10.
         */
        for(int i = Combination(setSize, subsetSize); i > 0; i--){
            /*
             * Copy the items from the input set according to the
             * indices we've already set up.  Alternatively, if you
             * just wanted the indices in your output, you could
             * just dup the index array here (but make sure you dup!
             * Otherwise the setup step at the bottom of this for
             * loop will mess up your output list!  You'll also want
             * to change the function's return type to
             * IList<IList<int>> in that case.
             */
            T[] subset = new T[subsetSize];
            for(int j = 0; j < subsetSize; j++)
                subset[j] = set[indices[j]];

            /* Add the subset to the return */
            rtn[subsetIndex++] = subset;

            /*
             * Set up indices for next subset.  This looks a lot
             * messier than it is.  It simply increments the
             * right-most index until it overflows, then carries
             * over left as far as it needs to.  I've made the
             * logic as fast as I could, which is why it's hairy-
             * looking.  Note that the inner for loop won't
             * actually run as long as a carry isn't required,
             * and will run at most once in any case.  The outer
             * loop will go through as few iterations as required.
             * 
             * You may notice that this logic doesn't check the
             * end case (when the left-most digit overflows).  It
             * doesn't need to, since the loop up above won't
             * execute again in that case, anyway.  There's no
             * reason to waste time checking that here.
             */
            for(int j = subsetSize - 1; j >= 0; j--)
                if(++indices[j] <= setSize - subsetSize + j){
                    for(int k = j + 1; k < subsetSize; k++)
                        indices[k] = indices[k - 1] + 1;
                    break;
                }
        }
    }
    return rtn;
}

static int Combination(int n, int r){
    if(r == 0 || r == n)
        return 1;

    /*
     * The formula for combination is:
     *
     *       n!
     *   ----------
     *   r!(n - r)!
     *
     * We'll actually use a slightly modified version here.  The above
     * formula forces us to calculate (n - r)! twice.  Instead, we only
     * multiply for the numerator the factors of n! that aren't canceled
     * out by (n - r)! in the denominator.
     */

    /*
     * nCr == nC(n - r)
     * We can use this fact to reduce the number of multiplications we
     * perform, as well as the incidence of overflow, where r > n / 2
     */
    if(r > n / 2) /* We DO want integer truncation here (7 / 2 = 3) */
        r = n - r;

    /*
     * I originally used all integer math below, with some complicated
     * logic and another function to handle cases where the intermediate
     * results overflowed a 32-bit int.  It was pretty ugly.  In later
     * testing, I found that the more generalized double-precision
     * floating-point approach was actually *faster*, so there was no
     * need for the ugly code.  But if you want to see a giant WTF, look
     * at the edit history for this post!
     */

    double denominator = Factorial(r);
    double numerator   = n;
    while(--r > 0)
        numerator *= --n;

    return (int)(numerator / denominator + 0.1/* Deal with rounding errors. */);
}

/*
 * The archetypical factorial implementation is recursive, and is perhaps
 * the most often used demonstration of recursion in text books and other
 * materials.  It's unfortunate, however, that few texts point out that
 * it's nearly as simple to write an iterative factorial function that
 * will perform better (although tail-end recursion, if implemented by
 * the compiler, will help to close the gap).
 */
static double Factorial(int x){
    /*
     * An all-purpose factorial function would handle negative numbers
     * correctly - the result should be Sign(x) * Factorial(Abs(x)) -
     * but since we don't need that functionality, we're better off
     * saving the few extra clock cycles it would take.
     */

    /*
     * I originally used all integer math below, but found that the
     * double-precision floating-point version is not only more
     * general, but also *faster*!
     */

    if(x < 2)
        return 1;

    double rtn = x;
    while(--x > 1)
        rtn *= x;

    return rtn;
}

【讨论】：

好的，这是一些相当不错的代码。这是我不明白的：当您将问题大小从 n 增加到 n+1 时，数据量应该加倍，运行时间也应该加倍。这不是我看到的：18 需要 0.16 秒，24 需要 0.97 秒。 64 倍的数据需要 6 倍的时间。
数据增加了一倍多。子集的数量翻了一番，这些集合的成员总数增加了 (n + 1) * 2 / n，这意味着随着 n 从 1 变为 2，它增加了四倍，随着 n 从 2 变为 3，它增加了三倍，并且接近当 n 趋于无穷时，因子为 2。（续...）
废话！我发现了一个错误...看起来我的组合中的分子溢出了。一会儿我会更新。
好的，修复了这个错误。我现在在 16 和 22 之间得到大约 80 倍的速度差异（我的机器上的 24 太高了），这与数据大小的差异大致相同。顺便说一句，很好。
呃，爸爸，你的代码仍然是错误的。自修改以来，您还没有测试过 Factorial() 函数。这是10的结果！到 13！：10：3628800 11：39916800 12：479001600 13：1932053504 注意 13！应该有至少和 12 一样多的 0！。

【解决方案7】：

您的结果列表与您的代码生成的结果不匹配。特别是，您没有显示生成空集。

如果我要生成可能包含数十亿个子集的 powerset，那么单独生成每个子集而不是一次生成所有子集可能会减少内存需求，从而提高代码速度。这个怎么样：

static class PowerSet<T>
{
    static long[] mask = { 1L << 0, 1L << 1, 1L << 2, 1L << 3, 
                           1L << 4, 1L << 5, 1L << 6, 1L << 7, 
                           1L << 8, 1L << 9, 1L << 10, 1L << 11, 
                           1L << 12, 1L << 13, 1L << 14, 1L << 15, 
                           1L << 16, 1L << 17, 1L << 18, 1L << 19, 
                           1L << 20, 1L << 21, 1L << 22, 1L << 23, 
                           1L << 24, 1L << 25, 1L << 26, 1L << 27, 
                           1L << 28, 1L << 29, 1L << 30, 1L << 31};
    static public IEnumerable<IList<T>> powerset(T[] currentGroupList)
    {
        int count = currentGroupList.Length;
        long max = 1L << count;
        for (long iter = 0; iter < max; iter++)
        {
            T[] list = new T[count];
            int k = 0, m = -1;
            for (long i = iter; i != 0; i &= (i - 1))
            {
                while ((mask[++m] & i) == 0)
                    ;
                list[k++] = currentGroupList[m];
            }
            yield return list;
        }
    }
}

那么你的客户端代码如下所示：

    static void Main(string[] args)
    {
        int[] intList = { 1, 2, 3, 4 };
        foreach (IList<int> set in PowerSet<int>.powerset(intList))
        {
            foreach (int i in set)
                Console.Write("{0} ", i);
            Console.WriteLine();
        }
    }

我什至会免费提供一个带有模板化参数的位旋转算法。为了提高速度，您可以将 powerlist() 内部循环包装在一个不安全的块中。差别不大。

在我的机器上，此代码比 OP 的代码稍慢，直到集合为 16 或更大。但是，16 个元素的所有时间都小于 0.15 秒。在 23 个元素中，它在 64% 的时间内运行。原始算法无法在我的机器上运行 24 个或更多元素 - 它耗尽了内存。

此代码需要 12 秒来生成数字 1 到 24 的功率集，省略屏幕 I/O 时间。那是在 12 秒内达到 1600 万次，即每秒大约 1400K。对于 10 亿（这是您之前引用的），大约是 760 秒。您认为这需要多长时间？

【讨论】：

这个解决方案得到了我的投票——尽管我建议进行一些修改。将汉明权重计算移到内部循环的开头，因此 T[] list = new T[weight]，而不是 T[count] 。另外，你的 i & (i-1) 玩弄是一种悲观主义，天真地去做吧。
i & (i -1) 怎么样？您的意思是在循环底部使用： i ^= 掩码？那会奏效。不清楚你的“汉明重量”是什么。在检查整个 int 之前，我不知道设置了多少位。更容易分配所需的最大内存。快点。没有内存损失——一次创建一个。

【解决方案8】：

它必须是 C，还是 C++ 也是一种选择？如果是 C++，您可以只从 STL 中输入自己的 list 类型。否则，您将不得不实现自己的列表 - 查找链接列表或动态大小的数组以获取有关如何执行此操作的指针。

【讨论】：

不要使用 C++ 列表。这是完全不同的。在 C++ 中，list 是一个链表。 C# 的 List 的等价物是 std::vector。

【解决方案9】：

我同意“首先优化 .NET”的观点。这是最无痛的。我想如果您使用 C# 指针编写一些非托管的 .NET 代码，除了 VM 开销外，它与 C 执行相同。

【讨论】：

【解决方案10】：

爸爸：

您可以将您的 Combination() 代码更改为：

    static long Combination(long n, long r)
    {
        r = (r > n - r) ? (n - r) : r;
        if (r == 0)
            return 1;
        long result = 1;
        long k = 1;
        while (r-- > 0)
        {
            result *= n--;
            result /= k++;
        }

        return result;
    }

这将把乘法和溢出的机会降到最低。

【讨论】：

这是一个有趣的方法。不过，它有点错过了重点。性能是主要目标，而不是一概而论，而且这个性能相当糟糕，我很抱歉。此外，它也存在与大输入相同的溢出问题（它出现在 n = 31 和 r = 13）。（...继续）
不过，在测试中，我发现使用双精度存储中间结果的更简单的实现（我知道它会更通用）也更快！我又要换了。
您的最新版本好多，但在我的帖子中仍然比浮点版本慢 3-4 倍。您已将乘法换成除法，但除法（甚至整数）比乘法慢得多。
好的，生成一个组合表并将其放入您的代码中，以便在运行时查找。见python代码iwebthereforeiam.com/files/combination_table.py