将 LINQ Select 中的递归方法组转换为迭代方法答案

【问题标题】：Convert Recursive Method Group in LINQ Select to iterative method将 LINQ Select 中的递归方法组转换为迭代方法
【发布时间】：2017-08-01 22:26:17
【问题描述】：

我有一个看起来像这样的类：

public class SourceObject
{
    public string Id { get; set; }
    public List<SourceObject> Children { get; set; }

    public SourceObject()
    {
        Children = new List<SourceObject>();
    }
}

如您所见，它有一个属性，其中包含同一类的更多实例列表。我正在为此类处理的数据意味着子级的数量在运行时之前是未知的，并且生成的对象图的整体“深度”也是未知的。

我需要创建一个从SourceObject 的对象图到DestinationObject 的类似形状图的“映射”（类似于 AutoMapper 从一个对象映射到另一个对象的方式）。

我有一个方法可以从我的源图映射到我的目标图，但是，这个方法使用递归：

// Recursive way of mapping each Source object to Destination
public static DestinationObject MapSourceToDestination(SourceObject source)
{
    var result = new DestinationObject();
    result.Id = source.Id;
    result.Children = source.Children.Select(MapSourceToDestination).ToList();
    return result;
}

当源对象图的大小不太大或不太深时，这可以正常工作，但是，当源对象图非常大时，此方法会抛出 StackOverflow 异常。

我已经设法创建了这个函数的替代版本，它使用类似于this answer 中描述的技术来删除递归并用队列/堆栈替换它）但是，我注意到队列/堆栈可以也变得非常大，我不确定我的实现是最有效的。

是否可以将递归函数转换为仅在源对象图上使用迭代的函数（即删除递归，理想情况下，使用队列/堆栈）？

【问题讨论】：

是的，你可以，但你必须重新扫描整个事情多次......例如，如果你有 1、2、3 和 1 取决于 2 取决于 3，你可以这样做： 1（有未解决的依赖，暂时跳过），2（有未解决的依赖，暂时跳过），3（解决），然后返回1（仍有未解决的依赖，暂时跳过），2（解决），返回1 （解决）
类似this?
@IvanStoev 这使用了一个Stack，所以它反对理想情况下，使用队列/堆栈
@IvanStoev OP 要求没有堆栈的解决方案。
@Evk 我知道（尽管他说 ideally）。但该解决方案与 Eric 的实现不同，堆栈大小纯粹是树的深度，而 IMO 是最优的。不过，用纯迭代解决方案祝 OP 好运:)

标签： c# linq recursion iteration

【解决方案1】：

是否可以将递归函数转换为纯粹使用迭代的函数在源对象图上（即删除递归，理想情况下，使用队列/堆栈）？

使用堆栈和队列可以实现将 LINQ.Select 的递归调用替换为堆栈。我使用了一个 Tuple 来记住父节点的 id。

运行时间 - o(n)。空间复杂度 - o（一个级别中的节点数）。

如果我们只使用队列，我们可以改变空间复杂度 - o(min(h*d, n))。 h 表示高度，b 表示节点中的最大子节点数。考虑这段代码：

public DestinationObject MapSourceToDestination(SourceObject root)
{
    Stack<Tuple<DestinationObject,int>> stack = new Stack<Tuple<DestinationObject,int>>();

    DestinationObject currentChild = new DestinationObject();
    currentChild.Id = root.Id;
    stack.Push(new Tuple<DestinationObject,int>(currentChild,root.Id));

    while(stack.Count > 0)
    {
        Tuple<DestinationObject,int> currentTuple = stack.Pop();

        current = currentTuple[0];

        children = current.Children;

        foreach (SourceObject sourceChild in root.Children)
        {
            currentChild = new DestinationObject();
            currentChild.Id = currentTuple[1];
            Children.Add(currentChild);
            stack.Push(new Tuple<DestinationObject,int>(currentChild,sourceChild.Id));
        }
    }
}

【讨论】：

【解决方案2】：

我应该敢说... 您有相互冲突的需求，因此您的问题在于需求/设计，而不是代码？您在问题中提到的两点：

你说 SourceObject 的子代数直到运行时才知道。 在这种情况下，堆栈溢出的可能性是不可避免的。当数据的大小未知并且在运行时它被证明大于机器上的可用空间时，就会发生这种情况。
此外，无论您喜欢什么，堆栈或队列是此类处理的正确数据结构，如果您想避免递归。您必须进行递归，或者必须将您的SourceObjects 存储在某个数据结构中，以便在继续处理时跟踪要访问的数据结构。

我会使用 Stack/Queue 方法而不是递归来进行图探索或图遍历，并注意以下事实：如果图足够大，那么我的 Stack/Queue 将消耗所有系统内存，并导致溢出。

为避免这种情况，要么增加机器上的内存（即扩大），要么增加为你工作的机器的数量，同时并行化你的算法（即扩大）。

【讨论】：

【解决方案3】：

我不认为一个函数纯粹使用迭代本身更好，但我会用几个扩展来实现它

public static SourceObject GetAtList(this SourceObject s, List<int> cycleRef)
{
    var ret = s;
    for (int i = 0; i < cycleRef.Count; i++)
    {
        ret = ret.Children[cycleRef[i]];
    }
    return ret;
}
public static void SetAtList(this DestinationObject d, List<int> cycleRef, SourceObject s)
{
    var ret = d;
    for (int i = 0; i < cycleRef.Count - 1; i++)
    {
        ret = ret.Children[cycleRef[i]];
    }
    ret.Children.Add ( new DestinationObject() { Id = s.Id } );
}

和迭代器列表

public static DestinationObject MapSourceToDestinationIter(SourceObject source)
{
    var result = new DestinationObject();
    result.Id = source.Id;
    if (source.Children.Count == 0)
    {
        return result;
    }
    List<int> cycleTot = new List<int>();
    List<int> cycleRef = new List<int>();
    cycleRef.Add(0);
    cycleTot.Add(source.Children.Count-1);
    do
    {
        var curr = source.GetAtList(cycleRef);
        result.SetAtList(cycleRef, curr);
        if (curr.Children.Count == 0)
        {
            cycleRef[cycleRef.Count - 1]++;
            while (cycleRef[cycleRef.Count - 1]> cycleTot[cycleTot.Count-1])
            {
                cycleRef.RemoveAt(cycleRef.Count - 1);
                cycleTot.RemoveAt(cycleTot.Count - 1);
                if (cycleRef.Count == 0)
                {
                    break;
                }
                cycleRef[cycleRef.Count - 1]++;
            } 
        } else
        {
            cycleRef.Add(0);
            cycleTot.Add(curr.Children.Count - 1);
        }
    } while (cycleTot.Count>0);
    return result;
}

我不一定建议继续这样做，但它可能比 Linq 替代方案更快...

无论如何，明确使用Stack（如Ivan Stoev 的answer）将是最佳解决方案。

【讨论】：

【解决方案4】：

我仍然相信大小为树的最大深度的堆栈是最佳的通用解决方案。

但有趣的是，数据结构和具体过程包含了实现转换所需的所有信息，而无需显式堆栈，仅基于 Children.Count。看看我们需要什么：

(1) 是否还有更多源子要处理：source.Children.Count != target.Children.Count)

(2) 下一个要处理的源子节点是：source.Children[target.Children.Count]

(3)当前处理子索引是多少：target.Children.Count - 1

请注意，上述规则适用于处理过程中的任何级别。

这里是实现：

public static DestinationObject MapSourceToDestination(SourceObject source)
{
    // Map everything except childen
    Func<SourceObject, DestinationObject> primaryMap = s => new DestinationObject
    {
        Id = s.Id,
        // ...
        Children = new List<DestinationObject>(s.Children.Count) // Empty list with specified capacity
    };

    var target = primaryMap(source);

    var currentSource = source;
    var currentTarget = target;
    int depth = 0;
    while (true)
    {
        if (currentTarget.Children.Count != currentSource.Children.Count)
        {
            // Process next child
            var sourceChild = currentSource.Children[currentTarget.Children.Count];
            var targetChild = primaryMap(sourceChild);
            currentTarget.Children.Add(targetChild);
            if (sourceChild.Children.Count > 0)
            {
                // Move one level down
                currentSource = sourceChild;
                currentTarget = targetChild;
                depth++;
            }
        }
        else
        {
            // Move one level up
            if (depth == 0) break;
            depth--;
            currentSource = source;
            currentTarget = target;
            for (int i = 0; i < depth; i++)
            {
                int index = currentTarget.Children.Count - 1;
                currentSource = currentSource.Children[index];
                currentTarget = currentTarget.Children[index];
            }
        }
    }

    return target;
}

唯一棘手（并且部分效率低下）的部分是向上移动步骤（这就是一般解决方案需要堆栈的原因）。如果对象具有Parent 属性，则很简单：

currentSource = currentSource.Parent;
currentTarget = currentTarget.Parent;

由于缺少此类属性，为了找到当前源项和目标项的父项，我们从根项开始，向下移动通过当前处理的索引（参见 (3)），直到达到所需的深度。

【讨论】：