从矩阵和距离计算中随机采样的循环答案

【问题标题】：Loops with random sampling from a matrix and distance calculation从矩阵和距离计算中随机采样的循环
【发布时间】：2018-06-13 17:42:04
【问题描述】：

我有一个节点列表，我需要将“p”个集线器随机分配给“n”个客户端。

我得到以下数据，第一行显示：

节点总数。
请求的集线器数量。
每个枢纽的总供应能力。

以下几行显示：

第一列是节点号。
第二列是“x”坐标。
第三个“y”坐标。

下面我将显示原始数据，添加colnames() 看起来像这样：

total_nodes hubs_required  total_capacity
    50           5              120

node number x_coordinate y_coordinate  node_demand
   1            2           62            3
   2           80           25           14
   3           36           88            1
   4           57           23           14
   .            .            .            .
   .            .            .            .
   .            .            .            .
   50           1           58            2

提供了 x 和 y 值，以便我们计算欧几里得距离。

nodes:

 50 5 120
 1 2 62 3
 2 80 25 14
 3 36 88 1
 4 57 23 14
 5 33 17 19
 6 76 43 2
 7 77 85 14
 8 94 6 6
 9 89 11 7
 10 59 72 6
 11 39 82 10
 12 87 24 18
 13 44 76 3
 14 2 83 6
 15 19 43 20
 16 5 27 4
 17 58 72 14
 18 14 50 11
 19 43 18 19
 20 87 7 15
 21 11 56 15
 22 31 16 4
 23 51 94 13
 24 55 13 13
 25 84 57 5
 26 12 2 16
 27 53 33 3
 28 53 10 7
 29 33 32 14
 30 69 67 17
 31 43 5 3
 32 10 75 3
 33 8 26 12
 34 3 1 14
 35 96 22 20
 36 6 48 13
 37 59 22 10
 38 66 69 9
 39 22 50 6
 40 75 21 18
 41 4 81 7
 42 41 97 20
 43 92 34 9
 44 12 64 1
 45 60 84 8
 46 35 100 5
 47 38 2 1
 48 9 9 7
 49 54 59 9
 50 1 58 2

我从第一行提取了信息。

nodes <- as.matrix(read.table(data))
header<-colnames(nodes)
clean_header <-gsub('X','',header)
requested_hubs <- as.numeric(clean_header[2])
max_supply_capacity <- as.numeric(clean_header[3])

我需要随机选择 5 个节点，它们将充当集线器

set.seed(37)
node_to_hub <-nodes[sample(nrow(nodes),requested_hubs,replace = FALSE),]

然后随机我需要为每个集线器分配节点计算集线器与每个节点之间的距离，当超过max_supply_capacity(120) 时选择以下集线器并重复过程。

在最后一次迭代之后，我需要返回所有集线器的累积距离总和。

我需要重复这个过程 100 次，并返回距离累积和的 min() 值。

这是我完全陷入困境的地方，因为我不确定如何循环遍历矩阵，更不用说当我必须随机选择元素时。

我得到了以下元素：

capacity <- c(numeric()) # needs to be <= to 120
distance_sum <- c(numeric())
global_hub_distance <- c(numeric())

欧几里得距离（四舍五入）的公式如下，但我不确定在分配节点时如何反映随机选择。

distance <-round(sqrt(((node_to_hub[i,2]-nodes[i,2]))^2+(node_to_hub[random,3]-nodes[random,3])^2))

我认为我需要的循环的想法如下，但正如我之前提到的，我不知道如何处理示例客户端选择以及随机客户端的距离计算。

    for(i in 1:100){
    node_to_hub
    for(i in 1:nrow(node_to_hub){
#Should I randomly sample the clients here???
    while(capacity < 120){ 
    node_demand <- nodes[**random**,3] 
    distance <-round(sqrt(((node_to_hub[i,2]-nodes[i,2]))^2+(node_to_hub[**random**,3]-nodes[**random**,3])^2))
    capacity <-c(capacity, node_demand)
    distance_sum <- c(distance_sum,distance)
}
global_hub_distance <- c(global_hub_distance,distance_sum)
capacity <- 0 
distance_sum <- 0 
}
min(global_hub_distance)
}

【问题讨论】：

你能为我们重做你的示例数据格式吗？也许将数据粘贴为dput(data) 的结果？我无法理解您的数据（尤其是 50 的第一行）
嗨，Evan，我已经修改了我的帖子，希望它会更清晰一些。
数据比较清晰，但是你的目标不是很清楚。简洁但更清晰。您有节点，您随机选择 5 个并将它们称为集线器。然后......对于每个集线器，您可以获得到每个节点的距离吗？直到您点击累积总数超过 120 的节点？阅读您想要的内容并不清楚。可以细化吗？
随机选择 5 个不同的节点，这些节点成为容量为 120 的集线器。为每个节点随机分配“客户端”节点，直到需求超过容量。计算从每个集线器到每个“客户端”节点的距离并将它们相加。将此值保存在一个变量中，该变量跟踪每次迭代的距离。重复此过程 100 次，然后从跟踪距离的变量中选择最小值。我希望这更清楚，如果需要更多说明，请告诉我，我们没有得到更多信息，所以我花了一段时间才弄清楚需要什么！
这样就清楚多了。我的答案翻转了容量和距离，所以我必须重新设计它。请问这是为了什么？我想这是一些英国电力/天然气分析师的工作？

标签： r loops matrix random sample

【解决方案1】：

不确定您要查找的内容，但此代码可能会对您有所帮助。它不是非常快，因为它不是在达到总容量后使用一段时间停止，而是在完整节点列表上执行一个 cumsum 并找到超过 120 的位置。

nodes <- structure(list(node_number = 1:50, 
               x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L, 
                     89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L, 
                     58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L, 
                     84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L, 
                     8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L, 
                     41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L), 
               y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L, 
                     72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L, 
                     18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L, 
                     32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
                     50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L), 
               node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L, 
                               7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L, 
                               14L, 11L, 19L,  15L, 15L, 4L, 13L, 
                               13L, 5L, 16L, 3L, 7L, 14L, 17L, 
                               3L, 3L, 12L, 14L, 20L, 13L, 10L, 
                               9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L, 
                               5L, 1L, 7L, 9L, 2L)), 
          .Names = c("node_number", "x", "y", "node_demand"), 
          class = "data.frame", row.names = c(NA, -50L))

total_nodes = nrow(nodes)
hubs_required = 5
total_capacity = 120
iterations <- 100
track_sums <- matrix(NA, nrow = iterations, ncol = hubs_required)
colnames(track_sums) <- paste0("demand_at_hub",1:hubs_required)

然后我更喜欢使用距离函数，在这种情况下，A 和 B 是两个独立的向量，分别是 c(x,y) 和 c(x,y)。

euc.dist <- function(A, B) round(sqrt(sum((A - B) ^ 2))) # distances

循环：

for(i in 1:iterations){
  # random hub selection
  hubs <- nodes[sample(1:total_nodes, hubs_required, replace = FALSE),]
  for(h in 1:hubs_required){
    # sample the nodes into a random order
    random_nodes <- nodes[sample(1:nrow(nodes), size = nrow(nodes), replace = FALSE),]
    # cumulative sum their demand, and get which number passes 120, 
    # and subtract 1 to get the node before that
    last <- which(cumsum(random_nodes$node_demand) > total_capacity) [1] - 1
    # get sum of all distances to those nodes (1 though the last)
     all_distances <- apply(random_nodes[1:last,], 1, function(rn) {
      euc.dist(A = hubs[h,c("x","y")], 
               B = rn[c("x","y")]) 
    })
    track_sums[i,h] <- sum(all_distances)
  }
}

min(rowSums(track_sums))

编辑

作为一个函数：

hubnode <- function(nodes, hubs_required = 5, total_capacity = 120, iterations = 10){
  # initialize results matrices
  track_sums <- node_count <- matrix(NA, nrow = iterations, ncol = hubs_required)
  colnames(track_sums) <- paste0("demand_at_hub",1:hubs_required)
  colnames(node_count) <- paste0("nodes_at_hub",1:hubs_required)
  # user defined distance function (only exists wihtin hubnode() function)
  euc.dist <- function(A, B) round(sqrt(sum((A - B) ^ 2))) 

  for(i in 1:iterations){
    # random hub selection
    assigned_hubs <- sample(1:nrow(nodes), hubs_required, replace = FALSE)
    hubs <- nodes[assigned_hubs,]
    assigned_nodes <- NULL
    for(h in 1:hubs_required){
      # sample the nodes into a random order
      assigned_nodes <- sample((1:nrow(nodes))[-assigned_hubs], replace = FALSE)
      random_nodes <- nodes[assigned_nodes,]
      # cumulative sum their demand, and get which number passes 120, 
      # and subtract 1 to get the node before that
      last <- which(cumsum(random_nodes$node_demand) > total_capacity) [1] - 1
      # if there are none
      if(is.na(last)) last = nrow(random_nodes)
      node_count[i,h] <- last
      # get sum of all distances to those nodes (1 though the last)
      all_distances <- apply(random_nodes[1:last,], 1, function(rn) {
        euc.dist(A = hubs[h,c("x","y")], 
                 B = rn[c("x","y")]) 
      })
      track_sums[i,h] <- sum(all_distances)
    }
  }
  return(list(track_sums = track_sums, node_count = node_count))
}

output <- hubnode(nodes, iterations = 100)

node_count <- output$node_count
track_sums <- output$track_sums

plot(rowSums(node_count),  
     rowSums(track_sums), xlab = "Node Count", ylab = "Total Demand", main = paste("Result of", 100, "iterations"))

min(rowSums(track_sums))

【讨论】：

哎呀，我总结的是距离而不是容量，给我一点时间
太棒了，非常感谢。我实际上需要使用禁忌搜索为这个练习的下一个变体创建一个函数，这样结构现在更清晰了。