优化运行时间：改变 igraph 中边的权重需要很长时间。有没有办法优化它？答案

【问题标题】：Optimize the runtime: change the weight of edges in an igraph takes long time. Is there a way to optimize it?优化运行时间：改变 igraph 中边的权重需要很长时间。有没有办法优化它？
【发布时间】：2020-07-19 06:03:25
【问题描述】：

我正在从 osmar 对象构建的 igraph 中搜索一组边，并希望更改这些边的权重。由于我的图表很大，因此这项任务需要很长时间。由于我在循环中运行此函数，因此运行时会变得更大。

有什么办法可以优化这个吗？

代码如下：

library(osmar)
library(igraph)
library(tidyr)
library(dplyr)

### Get data ----
src <- osmsource_api(url = "https://api.openstreetmap.org/api/0.6/")
muc_bbox <- center_bbox(11.575278, 48.137222, 1000, 1000)
muc <- get_osm(muc_bbox, src)

### Reduce to highways: ----
hways <- subset(muc, way_ids = find(muc, way(tags(k == "highway"))))
hways <- find(hways, way(tags(k == "name")))
hways <- find_down(muc, way(hways))
hways <- subset(muc, ids = hways)

#### Plot data ----
## Plot complete data and highways on top:
plot(muc)
plot_ways(muc, col = "lightgrey")
plot_ways(hways, col = "coral", add = TRUE)

### Define route start and end nodes: ----
id<-find(muc, node(tags(v %agrep% "Sendlinger Tor")))[1]
hway_start_node <-find_nearest_node(muc, id, way(tags(k == "highway"))) 
hway_start <- subset(muc, node(hway_start_node))

id <- find(muc, node(attrs(lon > 11.58 & lat > 48.15)))[1]
hway_end_node <- find_nearest_node(muc, id, way(tags(k == "highway")))
hway_end <- subset(muc, node(hway_end_node))

## Add the route start and and nodes to the plot:
plot_nodes(hway_start, add = TRUE, col = "red", pch = 19, cex = 2)
plot_nodes(hway_end, add = TRUE, col = "red", pch = 19, cex = 2)

### Create street graph ----
gr <- as.undirected(as_igraph(hways))

### Compute shortest route: ----
# Calculate route
route <- function(start_node,end_node) {
  get.shortest.paths(gr,
                     from = as.character(start_node),
                     to = as.character(end_node), 
                     mode = "all")[[1]][[1]]}
# Plot route
plot.route <- function(r,color) {
  r.nodes.names <- as.numeric(V(gr)[r]$name)
  r.ways <- subset(hways, ids = osmar::find_up(hways, node(r.nodes.names)))
  plot_ways(r.ways, add = TRUE, col = color, lwd = 2)
}
nways <-  1
numway <- 1
r <- route(hway_start_node,hway_end_node)

# Plot route

color <- colorRampPalette(c("springgreen","royalblue"))(nways)[numway]
plot.route(r,color)


## Route details ----
# Construct a new osmar object containing only elements 
# related to the nodes defining the route:
route_nodes <- as.numeric(V(gr)[r]$name)
route_ids <- find_up(hways, node(route_nodes))

osmar.route <- subset(hways, ids = route_ids)
osmar.nodes.ids <- osmar.route$nodes$attrs$id

# Extract the nodes’ coordinates,
osmar.nodes.coords <- osmar.route$nodes$attrs[, c("lon", "lat")]
osmar.nodes <- cbind(data.frame(ids = osmar.nodes.ids),
                     data.frame(ids_igraph = as.numeric(V(gr)[r]) ),
                     osmar.nodes.coords) 


## Find edges ids containing points of interest ----
wished.coords <- data.frame(wlon = c(11.57631),
                            wlat = c(48.14016)) 


# Calculate all distances
distances <- crossing(osmar.nodes,wished.coords) %>%
             mutate(dist = geosphere::distHaversine(cbind(lon,lat),cbind(wlon,wlat)))


# Select nodes below maximum distance :
mindist <- 50 #m

wished.nodes <- distances %>% filter(dist < mindist)

# Select edges incident to these nodes :
selected.edges <- unlist(incident_edges(gr,V(gr)[wished.nodes$ids_igraph]))

This is where the slowdown occurs: Weight of selected edges, change it by multiplying it with 10
E(gr)[selected.edges]$weight<-E(gr)[selected.edges]$weight*10

这是减速发生的地方：所选边缘的权重，通过将其乘以 10 来更改它

E(gr)[selected.edges]$weight<-E(gr)[selected.edges]$weight*10

也许我可以使用哈希图？

更新

哈希图

单位：秒

Hashmap:

expr           min       lq     mean   median      uq      max     neval 

Hashmap      3.248543 3.289474 3.472038 3.324417 3.734050 4.188924   100 

Without      3.267549 3.333012 3.557179 3.367015 3.776429 5.643784   100

Sadly it does not seemt to bring a lot of improvement.


library(hashmap) 
#https://github.com/nathan-russell/hashmap
         H <- hashmap(E(gr)[selected.edges],E(gr)[selected.edges]$weight)
         sapply(H$find(E(grr)[selected.edges]), function(x) x * 10)

更新： 根据 igraph doc，igraph 是线程安全的，所以我可以使用并行。

我目前正在尝试这个：

no_cores <- detectCores(logical = FALSE) 
 data <- split(selected.edges,factor(sort(rank(selected.edges)%%no_cores)))
 c_result <- mclapply(1:no_cores, function(x) {
 E(gr)[unlist(data[[x]])]$weight * 1000 / mean_value }, mc.cores = no_cores) 
   
     E(gr)[unlist(data)]$weight<-unlist(c_result)

我想知道为什么我必须在并行循环之外执行“编写步骤”。当我试图在循环中将权重写回 igraph 时，它不起作用，即权重没有得到更新。

提前感谢您！ BR

【问题讨论】：

标签： r dplyr igraph tidyr osmar

【解决方案1】：

正如Advanced R 中所展示的，R 中的实现性能可能因语法而异。

E(gr)[selected.edges]$weight<-E(gr)[selected.edges]$weight*10

是一个有效的语法，但也可以用其他方式表述：

set.edge.attribute(graph=gr,name="weight",index=selected.edges,value=10*get.edge.attribute(graph=gr,name="weight",index=selected.edges))

让我们比较一下这两种解决方案：

microbenchmark::microbenchmark(
  ref={E(gr)[selected.edges]$weight<-E(gr)[selected.edges]$weight*10},
  new={set.edge.attribute(graph=gr,name="weight",index=selected.edges,value=10*get.edge.attribute(graph=gr,name="weight",index=selected.edges))})

Unit: microseconds
 expr       min        lq       mean    median        uq       max neval cld
  ref 15920.404 16567.788 17793.4412 17111.583 18491.685 25867.477   100   b
  new   246.974   266.462   296.5088   278.769   292.718   662.974   100  a

@Andreas，如果这可以解决您的问题，您能否检查一个更大的数据集？

【讨论】：

很抱歉，我不久前没有提出这个解决方案...... ;)
我确实在我的数据集上进行了检查，它将运行时间缩短了一半。
瓦尔迪没问题！我非常感谢你和你已经做出的所有努力！
Waldi 您是如何得出答案的？这个调用在哪里指定？（对不起，也许是一个愚蠢的问题）
没有愚蠢的问题：igraph.org/r/doc/set_edge_attr.html