Python 中的统一成本搜索答案

【问题标题】：Uniform Cost Search in PythonPython 中的统一成本搜索
【发布时间】：2017-09-07 08:37:34
【问题描述】：

我在 Python 中实现了一个简单的图形数据结构，结构如下。此处的代码只是为了阐明函数/变量的含义，但它们非常不言自明，因此您可以跳过阅读。

# Node data structure
class Node: 

    def __init__(self, label):        
        self.out_edges = []
        self.label = label
        self.is_goal = False


    def add_edge(self, node, weight = 0):          
        self.out_edges.append(Edge(node, weight))


# Edge data structure
class Edge:

    def __init__(self, node, weight = 0):          
        self.node = node
        self.weight = weight

    def to(self):                                  
        return self.node


# Graph data structure, utilises classes Node and Edge
class Graph:    

    def __init__(self):                             
        self.nodes = []

    # some other functions here populate the graph, and randomly select three goal nodes.

现在我正在尝试实现一个uniform-cost search（即具有优先级队列的BFS，保证最短路径），它从给定节点v开始，并返回一个最短路径（以列表形式）到其中一个三个目标节点。 目标节点是指属性is_goal 设置为true 的节点。

这是我的实现：

def ucs(G, v):
    visited = set()                  # set of visited nodes
    visited.add(v)                   # mark the starting vertex as visited
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples with cumulative cost
    q.put((0, v))                    # add the starting node, this has zero *cumulative* cost   
    goal_node = None                 # this will be set as the goal node if one is found
    parents = {v:None}               # this dictionary contains the parent of each node, necessary for path construction

    while not q.empty():             # while the queue is nonempty
        dequeued_item = q.get()        
        current_node = dequeued_item[1]             # get node at top of queue
        current_node_priority = dequeued_item[0]    # get the cumulative priority for later

        if current_node.is_goal:                    # if the current node is the goal
            path_to_goal = [current_node]           # the path to the goal ends with the current node (obviously)
            prev_node = current_node                # set the previous node to be the current node (this will changed with each iteration)

            while prev_node != v:                   # go back up the path using parents, and add to path
                parent = parents[prev_node]
                path_to_goal.append(parent)   
                prev_node = parent

            path_to_goal.reverse()                  # reverse the path
            return path_to_goal                     # return it

        else:
            for edge in current_node.out_edges:     # otherwise, for each adjacent node
                child = edge.to()                   # (avoid calling .to() in future)

                if child not in visited:            # if it is not visited
                    visited.add(child)              # mark it as visited
                    parents[child] = current_node   # set the current node as the parent of child
                    q.put((current_node_priority + edge.weight, child)) # and enqueue it with *cumulative* priority

现在，经过大量测试并与其他算法进行比较，这个实现似乎运行良好 - 直到我用这张图尝试它：

无论出于何种原因，ucs(G,v) 返回的路径 H -> I 的成本为 0.87，而路径 H -> F -> I 的成本为 0.71（此路径是通过运行 DFS 获得的）。下图也给出了错误的路径：

算法给出了G -> F而不是G -> E -> F，由DFS再次获得。在这些罕见的情况下，我能观察到的唯一模式是所选目标节点总是有一个循环。我不知道出了什么问题。任何提示将不胜感激。

【问题讨论】：

在您实际访问某个节点之前以及在您确定在那里找到最便宜的路径之前，您会认为该节点已“访问过”。
... 扩展：如果有两个路径到一个节点，你只考虑其中一个，因为你在找到第一个路径时标记了一个访问过的节点，而不检查是否没有另一条（更便宜的）路径。这也与“父节点”冲突，其中每个节点只有一个父节点，只有当它是最便宜路径上的父节点时才可以
我明白你的意思...但是在我给出的第一个示例中，如果算法要选择最便宜的路径，为什么要选择路径H -> I？优先级队列排名不应该解决这个问题吗？我将如何修复访问/父对象？
“为什么算法选择路径 H -> I，如果它是为了选择最便宜的路径” - 因为它在 dhke 和我刚才描述的方式中是错误的。如果事情总是按照他们应该做的而不是你实际编写的，编程会容易得多。
这与您的问题无关，但 queue 模块中的类会执行一些您不需要的线程同步操作。对于仅在一个线程中使用的基本优先级队列，请改用heapq（这是queue.PriorityQueue 在内部用于其实现的内容）。

标签： python algorithm search graph

【解决方案1】：

通常对于搜索，我倾向于保留队列中节点部分的路径。这并不是真正的内存效率，但实施起来更便宜。

如果您想要父地图，请记住只有在子地图位于队列顶部时更新父地图才是安全的。只有这样算法才能确定到当前节点的最短路径。

def ucs(G, v):
    visited = set()                  # set of visited nodes
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples 
                                     # (f, n, path), with
                                     # f: the cumulative cost,
                                     # n: the current node,
                                     # path: the path that led to the expansion of the current node
    q.put((0, v, [v]))               # add the starting node, this has zero *cumulative* cost 
                                     # and it's path contains only itself.

    while not q.empty():             # while the queue is nonempty
        f, current_node, path = q.get()
        visited.add(current_node)    # mark node visited on expansion,
                                     # only now we know we are on the cheapest path to
                                     # the current node.

        if current_node.is_goal:     # if the current node is a goal
            return path              # return its path
        else:
            for edge in in current_node.out_edges:
                child = edge.to()
                if child not in visited:
                    q.put((current_node_priority + edge.weight, child, path + [child]))

注意：我还没有真正测试过这个，如果它不能立即工作，请随时发表评论。

【讨论】：

这将执行重复访问，可能会有很多，因为在尝试将其子节点入队之前，您不会检查节点是否已被访问，并且您不会做任何重复数据删除操作同一节点的队列条目。
重复删除队列条目可能需要某种减少键功能，我们不知道这些优先级队列是否具有，但添加visited 检查以确保您没有' t 在重新访问时重新扩展访问过的节点。
如果我们有循环怎么办？
@user2357112 经验：将重复数据删除留给队列的实现，不要用它来混淆搜索算法。是的，您只需要保留到队列中每个节点的最短路径。但是扩展节点只能更新访问列表的问题依然存在。
@dhke 我想我宁愿保留父地图。如果我只是更改visited.add(...) 发生的位置，那么将parents[child] = current_node 留在它所在的位置应该始终为子节点提供正确的父节点——对吗？

【解决方案2】：

展开节点前的简单检查可以避免重复访问。

while not q.empty():             # while the queue is nonempty
    f, current_node, path = q.get()
    if current_node not in visited: # check to avoid duplicate expansions
       visited.add(current_node)    # mark node visited on expansion,
                                    # only now we know we are on the cheapest path to
                                    # the current node.
       if current_node.is_goal:     # if the current node is a goal
          return path               # return its path
       ...

【讨论】：