在非常大的树上执行 DFS 的最佳方法是什么？答案

【问题标题】：What's the best way to perform DFS on a very large tree?在非常大的树上执行 DFS 的最佳方法是什么？
【发布时间】：2011-06-20 16:42:42
【问题描述】：

情况如下：

应用程序世界由数十万个状态组成。
给定一个状态，我可以计算出一组 3 或 4 个其他可达状态。一个简单的递归可以构建一个状态树，它会很快变得非常大。
我需要从根状态到此树中的特定深度执行 DFS，以搜索包含“最小”状态的子树（计算节点的值与问题无关）。

使用单线程执行 DFS 工作，但速度很慢。向下覆盖 15 个级别可能需要几分钟，我需要改进这种糟糕的表现。尝试为每个子树分配一个线程会创建太多线程并导致OutOfMemoryError。使用ThreadPoolExecutor 也好不到哪里去。

我的问题：遍历这棵大树最有效的方法是什么？

【问题讨论】：

如果您认为自己提出了一个合理的问题作为一个明确的可能性并跟进所有给出的答案，那么您无能为力。 ;)

标签： java tree depth-first-search

【解决方案1】：

我不认为导航树是您的问题，因为您的树有大约 3600 万个节点。相反，您对每个节点所做的事情更有可能是昂贵的。

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicLong;

public class Main {
    public static final int TOP_LEVELS = 2;

    enum BuySell {}

    private static final AtomicLong called = new AtomicLong();

    public static void main(String... args) throws InterruptedException {
        int maxLevels = 15;
        long start = System.nanoTime();
        method(maxLevels);
        long time = System.nanoTime() - start;
        System.out.printf("Took %.3f second to navigate %,d levels called %,d times%n", time / 1e9, maxLevels, called.longValue());
    }

    public static void method(int maxLevels) throws InterruptedException {
        ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        try {
            int result = method(service, 0, maxLevels - 1, new int[maxLevels]).call();
        } catch (Exception e) {
            e.printStackTrace();
        }
        service.shutdown();
        service.awaitTermination(10, TimeUnit.MINUTES);
    }

    // single threaded process the highest levels of the tree.
    private static Callable<Integer> method(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
        int choices = level % 2 == 0 ? 3 : 4;
        final List<Callable<Integer>> callables = new ArrayList<Callable<Integer>>(choices);
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            Callable<Integer> callable = level < TOP_LEVELS ?
                    method(service, level + 1, maxLevel, options) :
                    method1(service, level + 1, maxLevel, options);
            callables.add(callable);
        }
        return new Callable<Integer>() {
            @Override
            public Integer call() throws Exception {
                Integer min = Integer.MAX_VALUE;
                for (Callable<Integer> result : callables) {
                    Integer num = result.call();
                    if (min > num)
                        min = num;
                }
                return min;
            }
        };
    }

    // at this level, process the branches in separate threads.
    private static Callable<Integer> method1(final ExecutorService service, final int level, final int maxLevel, final int[] options) {
        int choices = level % 2 == 0 ? 3 : 4;
        final List<Future<Integer>> futures = new ArrayList<Future<Integer>>(choices);
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            final int[] optionsCopy = options.clone();
            Future<Integer> future = service.submit(new Callable<Integer>() {
                @Override
                public Integer call() {
                    return method2(level + 1, maxLevel, optionsCopy);
                }
            });
            futures.add(future);
        }
        return new Callable<Integer>() {
            @Override
            public Integer call() throws Exception {
                Integer min = Integer.MAX_VALUE;
                for (Future<Integer> result : futures) {
                    Integer num = result.get();
                    if (min > num)
                        min = num;
                }
                return min;
            }
        };
    }

    // at these levels each task processes in its own thread.
    private static int method2(int level, int maxLevel, int[] options) {
        if (level == maxLevel) {
            return process(options);
        }
        int choices = level % 2 == 0 ? 3 : 4;
        int min = Integer.MAX_VALUE;
        for (int i = 0; i < choices; i++) {
            options[level] = i;
            int n = method2(level + 1, maxLevel, options);
            if (min > n)
                min = n;
        }

        return min;
    }

    private static int process(final int[] options) {
        int min = options[0];
        for (int i : options)
            if (min > i)
                min = i;
        called.incrementAndGet();
        return min;
    }
}

打印

Took 1.273 second to navigate 15 levels called 35,831,808 times

我建议您限制线程的数量，并且只对树的最高级别使用单独的线程。你有几个核心？一旦你有足够多的线程来让每个核心保持忙碌，你就不需要创建更多的线程，因为这只会增加开销。

Java 有一个内置的堆栈，它是线程安全的，但是我会使用更高效的 ArrayList。

【讨论】：

你的机器比我的好……你的多线程代码运行时间超过 31 秒，没变！ :-(
我的子树遍历需要返回子树中的最小值，因此我的“方法”必须返回一个 int，我需要使用Future 来获取结果，并选择最小值价值。这与您的即发即弃的遍历模式有很大不同。您可以编辑以反映这种差异吗？

【解决方案2】：

您肯定必须使用迭代方法。最简单的方法是基于堆栈的 DFS，其伪代码类似于：

STACK.push(root)
while (STACK.nonempty) 
   current = STACK.pop
   if (current.done) continue
   // ... do something with node ...
   current.done = true
   FOREACH (neighbor n of current) 
       if (! n.done )
           STACK.push(n)

时间复杂度为 O(n+m)，其中 n (m) 表示图中节点（边）的数量。因为你有一棵树，所以它是 O(n) 并且应该很容易在 n>1.000.000 时快速工作......

【讨论】：

如果我在内存中拥有整棵树，这将是合理的。然而，它是一棵状态树，每一层都是从它上面的节点即时计算出来的。无论如何，使用迭代方法如何提高性能？
“它是一棵状态树，每一层都是从它上面的节点动态计算出来的”——这没问题。只需使用该代码来确定neighbor n of current。它可能会更快，因为递归调用将复制堆栈上的许多对象。相信我：在标准的现代计算机上，仅遍历