在加权网络中找到“渗透”阈值的算法答案

【问题标题】：Algorithm to find the "percolation" threshold in a weighed network在加权网络中找到“渗透”阈值的算法
【发布时间】：2012-11-16 10:03:40
【问题描述】：

我有一些由转移概率链接的状态（嵌入在转移矩阵中，如在马尔可夫链中）。我想通过只考虑足够高的概率来总结这个转换矩阵，它们允许从一个状态（〜节点）到另一个状态（我的转换矩阵中的第一个和最后一个）。一个阈值，如果我只考虑更高的概率，我的转换矩阵永远不允许从第一个状态（或节点）移动到最后一个状态。

两个问题：

是否有一些众所周知的库（最好是 python 语言）实现了这种方法？我的天真/经验/原型方法将是一个降低阈值的循环，然后检查我是否可以从第一个状态流过转换矩阵到最后一个状态（距离矩阵中的最佳路径算法？）。但这可能需要非常长的计算时间？

Example 1: 
DistMatrix = [[       'a',   'b',     'c',    'd'], 
             ['a',  0,      0.3,    0.4,    0.1],
             ['b',   0.3,    0,      0.9,    0.2],
             ['c',   0.4,    0.9,    0,      0.7],
             ['d',   0.1,    0.2,    0.7,   0]]  
    states are a,b,c,d. I want to find the value (threshold) that allow to go from a to d (no matter if other states are walked)  
    Naive approach:
    - first loop: threshold 0.9, I get rid of lesser probabilities: I can only connect c and b 
    - second loop: threshold 0.7, I get rid of lesser probabilities: I can only connect c, b, d
    - third loop: threshold 0.4, I get rid of lesser probabilities: I can connect a,c, b, d: here is my threshold: 0.4

--> 一旦我的转换矩阵有数千个状态，应该会非常复杂吗？ --> 算法建议？

Example 2:
DistMatrix =
[       'a',   'b',     'c',    'd'],
['a',   0,      0.3,    0.4,    0.7],
['b',   0.3,    0,      0.9,    0.2],
['c',   0.4,    0.9,    0,      0.1],
['d',   0.7,    0.2,    0.1,    0] ] 
states are a,b,c,d. I want to find the value (threshold) that allow to go from a to d (no matter if other states are walked) 
Naive approach:
-first loop: threshold 0.9, I get rid of all others: I can only connect c and b
-second loop: threshold 0.7, I get rid of lesser connexion: I connect b and c, and a and d but because a and d are connected, I have my threshold!

【问题讨论】：

标签： python analysis networkx

【解决方案1】：

为了扩展 E 先生的建议，这里有两个版本的算法，可以在具有几千个节点的图上正常工作。两个版本都使用Numpy，第二个版本也使用NetworkX。

您需要去掉“a”、“b”、“c”和“d”标识符才能使用 Numpy 数组。这很容易通过将节点名称转换为 0 到 len(nodes) 之间的整数来完成。您的数组应如下所示

DistMatrix1 = np.array([[0,      0.3,    0.4,    0.1],
                        [0.3,    0,      0.9,    0.2],
                        [0.4,    0.9,    0,      0.7],
                        [0.1,    0.2,    0.7,   0]])

DistMatrix2 = np.array([[0,      0.3,    0.4,    0.7],
                        [0.3,    0,      0.9,    0.2],
                        [0.4,    0.9,    0,      0.1],
                        [0.7,    0.2,    0.1,    0]])

使用numpy.unique 获取距离矩阵中所有概率的排序数组。然后，按照 E 先生的建议执行标准二分搜索。在二分搜索的每一步，如果矩阵中的条目低于当前概率，则将它们替换为 0。在图上运行广度优先搜索，从第一个节点开始，看看是否到达最后一个节点。如果这样做，则阈值较高，否则，阈值较低。 bfs代码其实是从NetworkX版本改编而来的。

import numpy as np

def find_threshold_bfs(array):
    first_node = 0
    last_node = len(array) - 1
    probabilities = np.unique(array.ravel())
    low = 0
    high = len(probabilities)

    while high - low > 1:
        i = (high + low) // 2
        prob = probabilities[i]
        copied_array = np.array(array)
        copied_array[copied_array < prob] = 0.0
        if bfs(copied_array, first_node, last_node):
            low = i
        else:
            high = i

    return probabilities[low]


def bfs(graph, source, dest):
    """Perform breadth-first search starting at source. If dest is reached,
    return True, otherwise, return False."""
    # Based on http://www.ics.uci.edu/~eppstein/PADS/BFS.py
    # by D. Eppstein, July 2004.
    visited = set([source])
    nodes = np.arange(0, len(graph))
    stack = [(source, nodes[graph[source] > 0])]
    while stack:
        parent, children = stack[0]
        for child in children:
            if child == dest:
                return True
            if child not in visited:
                visited.add(child)
                stack.append((child, nodes[graph[child] > 0]))
        stack.pop(0)
    return False

较慢但较短的版本使用 NetworkX。在二分查找中，不运行 bfs，而是将矩阵转换为 NetworkX 图，并检查是否存在从第一个节点到最后一个节点的路径。如果有路径，则阈值较高，如果没有路径，则阈值较低。这很慢，因为 NetworkX 中的所有图形数据结构都比 Numpy 数组效率低得多。但是，它的优势在于可以访问许多其他有用的算法。

import networkx as nx
import numpy as np

def find_threshold_nx(array):
    """Return the threshold value for adjacency matrix in array."""
    first_node = 0
    last_node = len(array) - 1
    probabilities = np.unique(array.ravel())
    low = 0
    high = len(probabilities)

    while high - low > 1:
        i = (high + low) // 2
        prob = probabilities[i]
        copied_array = np.array(array)
        copied_array[copied_array < prob] = 0.0
        graph = nx.from_numpy_matrix(copied_array)
        if nx.has_path(graph, first_node, last_node):
            low = i
        else:
            high = i

    return probabilities[low]

NetworkX 版本在超过一千个节点左右的图上崩溃（在我的笔记本电脑上）。 bfs 版本可以轻松找到几千个节点的图的阈值。

代码运行示例如下。

In [5]: from percolation import *

In [6]: print('Threshold is {}'.format(find_threshold_bfs(DistMatrix1)))
Threshold is 0.4

In [7]: print('Threshold is {}'.format(find_threshold_bfs(DistMatrix2)))
Threshold is 0.7

In [10]: big = np.random.random((6000, 6000))

In [11]: print('Threshold is {}'.format(find_threshold_bfs(big)))
Threshold is 0.999766933071

对于时间安排，我得到（在半新的 Macbook Pro 上）：

In [5]: smaller = np.random.random((100, 100))

In [6]: larger = np.random.random((800, 800))

In [7]: %timeit find_threshold_bfs(smaller)
100 loops, best of 3: 11.3 ms per loop

In [8]: %timeit find_threshold_nx(smaller)
10 loops, best of 3: 94.9 ms per loop

In [9]: %timeit find_threshold_bfs(larger)
1 loops, best of 3: 207 ms per loop

In [10]: %timeit find_threshold_nx(larger)
1 loops, best of 3: 6 s per loop

希望这会有所帮助。

更新

我修改了 bfs 代码，使其在到达目标节点时停止。上面的代码和时间已经更新。

【讨论】：

很好 :-) 我的回答很仓促，在它的状态下并没有太大帮助。

【解决方案2】：

但不确定我是否正确解释了您的问题：

假设您有一个候选阈值，并且您想确定a 和d 之间是否存在路径。您可以通过在阈值图上执行简单的深度优先搜索并查看您想要的结束节点 d 是否已被访问，来检查哪些节点可以从 a 访问。

要真正找到阈值，您知道它必须介于零和图表中的最大转换概率（此处为 0.9）之间。因此，您可以对阈值执行二进制搜索，在每个阶段使用深度优先搜索来检查您是否在a 和d 之间有路径。

【讨论】：