【问题标题】:R: recursive algorithm for querying a binary treeR:用于查询二叉树的递归算法
【发布时间】:2020-08-22 00:04:48
【问题描述】:

我有一棵名为 mytree 的树,如下所示:

在 R 中,我将它存储为一个列表:

mytree <- list(left = structure(list(y = -10, x = 10, grad = -10.5, sim_score = 110.25, 
    value = -10.5, criterion = "x < 15"), row.names = 1L, class = "data.frame"), 
    right = list(left = list(left = structure(list(y = 7, x = 20, 
        grad = 6.5, sim_score = 42.25, value = 6.5, criterion = "x < 22.5"), row.names = 2L, class = "data.frame"), 
        right = structure(list(y = 8, x = 25, grad = 7.5, sim_score = 56.25, 
            value = 7.5, criterion = "x >= 22.5"), row.names = 3L, class = "data.frame"), 
        root = list(root = structure(list(y = c(7, 8), x = c(20, 
        25), grad = c(6.5, 7.5), sim_score = c(98, 98), value = c(7, 
        7), criterion = c("x < 30", "x < 30")), row.names = 2:3, class = "data.frame"), 
            gain = 0.5)), right = structure(list(y = -7, x = 35, 
        grad = -7.5, sim_score = 56.25, value = -7.5, criterion = "x >= 30"), row.names = 4L, class = "data.frame"), 
        root = list(root = structure(list(y = c(7, 8, -7), x = c(20, 
        25, 35), grad = c(6.5, 7.5, -7.5), sim_score = c(14.0833333333333, 
        14.0833333333333, 14.0833333333333), value = c(2.16666666666667, 
        2.16666666666667, 2.16666666666667), criterion = c("x >= 15", 
        "x >= 15", "x >= 15")), row.names = 2:4, class = "data.frame"), 
            gain = 140.166666666667)), root = list(root = structure(list(
        y = c(-10, 7, 8, -7), x = c(10, 20, 25, 35), grad = c(-10.5, 
        6.5, 7.5, -7.5), sim_score = c(4, 4, 4, 4)), row.names = c(NA, 
    -4L), class = "data.frame"), gain = 120.333333333333))

看起来像这样

$left
    y  x  grad sim_score value criterion
1 -10 10 -10.5    110.25 -10.5    x < 15

$right
$right$left
$right$left$left
  y  x grad sim_score value criterion
2 7 20  6.5     42.25   6.5  x < 22.5

$right$left$right
  y  x grad sim_score value criterion
3 8 25  7.5     56.25   7.5 x >= 22.5

$right$left$root
$right$left$root$root
  y  x grad sim_score value criterion
2 7 20  6.5        98     7    x < 30
3 8 25  7.5        98     7    x < 30

$right$left$root$gain
[1] 0.5



$right$right
   y  x grad sim_score value criterion
4 -7 35 -7.5     56.25  -7.5   x >= 30

$right$root
$right$root$root
   y  x grad sim_score    value criterion
2  7 20  6.5  14.08333 2.166667   x >= 15
3  8 25  7.5  14.08333 2.166667   x >= 15
4 -7 35 -7.5  14.08333 2.166667   x >= 15

$right$root$gain
[1] 140.1667



$root
$root$root
    y  x  grad sim_score
1 -10 10 -10.5         4
2   7 20   6.5         4
3   8 25   7.5         4
4  -7 35  -7.5         4

$root$gain
[1] 120.3333

拆分存储在criterion 下,休假值存储在value 下。

给定一个新数据点x = 5,我想查询mytree 并查看该实例属于哪个叶节点。对于x = 5,我的函数应该输出-10.5 的值,因为5 &lt; 15。类似地,如果x = 25,那么它应该以值7.5 结束。以下是我希望 pred_tree 函数输出的更多示例:

newdata <- data.frame(x = c(5, 19, 18, 30))
> pred_tree(tree = mytree, newdata = newdata)
[1] -10.5
[2] 6.5
[3] 6.5
[4] -7.5

这是我目前所拥有的:

pred_tree <- function(tree, newdata){
  for(i in length(tree)){
    # Check if this is a leaf
    if(length(tree[[i]]) == 1){
      # Check criterion
      if(eval(parse(text=tree[[i]]$criterion))){
        # Return value of leaf
        return(tree[[i]]$value[1])
      }
    }else if(length(tree[[i]]) > 1){
      for(j in 1:length(tree[[i]])){
        if(length(tree[[i]][[j]]) == 1){
          # Check criterion
          if(eval(parse(text=tree[[i]][[j]]$criterion))){
            # Return value of leaf
            return(tree[[i]][[j]]$value[1])
          }
        }
      }
    }
  }
}

pred_tree(tree, newdata = newdata)

不幸的是,这个函数没有返回正确的输出。此外,如果我有很多查询要运行,这相当笨重并且可能非常慢。我猜使用递归算法比使用嵌套的 for 循环更有意义。谁能指出我正确的方向?

@@@@@@@@@@@@@ 编辑@@@@@@@@@@@@@@

mytree3 <- list(left = list(left = structure(list(y = -10, x = 10, grad = 0, 
    sim_score = 0, value = 0, criterion = "x < 15"), row.names = 1L, class = "data.frame"), 
    right = structure(list(y = 7, x = 20, grad = -0.5, sim_score = 0.25, 
        value = -0.5, criterion = "x >= 15"), row.names = 2L, class = "data.frame"), 
    root = list(root = structure(list(y = c(-10, 7), x = c(10, 
    20), grad = c(0, -0.5), sim_score = c(0.125, 0.125), value = c(-0.25, 
    -0.25), criterion = c("x < 22.5", "x < 22.5")), row.names = 1:2, class = "data.frame"), 
        gain = 0.125)), right = list(left = structure(list(y = 8, 
    x = 25, grad = 0.5, sim_score = 0.25, value = 0.5, criterion = "x < 30"), row.names = 3L, class = "data.frame"), 
    right = structure(list(y = -7, x = 35, grad = 0, sim_score = 0, 
        value = 0, criterion = "x >= 30"), row.names = 4L, class = "data.frame"), 
    root = list(root = structure(list(y = c(8, -7), x = c(25, 
    35), grad = c(0.5, 0), sim_score = c(0.125, 0.125), value = c(0.25, 
    0.25), criterion = c("x >= 22.5", "x >= 22.5")), row.names = 3:4, class = "data.frame"), 
        gain = 0.125)), root = list(root = structure(list(y = c(-10, 
7, 8, -7), x = c(10, 20, 25, 35), grad = c(0, -0.5, 0.5, 0), 
    sim_score = c(0, 0, 0, 0), value = c(0, 0, 0, 0)), row.names = c(NA, 
-4L), class = "data.frame"), gain = 0.25))

运行以下没有给出正确的输出

pred_tree(tree = mytree3, newdata = newdata)

【问题讨论】:

    标签: r algorithm recursion binary-tree


    【解决方案1】:

    你可以做的一个简单的递归可以是:

    .pred <- function(x, tree)
     {
       #Ensure you pass in a list and not a dataframe
       if(is.data.frame(tree)) tree <- list(tree)
       #Reorder the list if necessary
       if(!is.data.frame(tree[[1]])) tree <- tree[c(2, 1, 3)]
       # Check whether the condition is met. If so return 
       if (eval(parse(text=tree[[1]][["criterion"]]),list(x = x))) tree[[1]][["value"]][1]
       else .pred(x, tree[[2]])
    }
    pred_tree <- function(tree, newdata)
    {
      cbind(newdata,pred = Vectorize(.pred,"x")(x= newdata$x,tree))
    }
    

    现在你可以调用你的函数了:

    pred_tree(mytree,data.frame(x=c(1,10,15,18,19,22,23,25,29,30,33,35,100)))
         x  pred
    1    1 -10.5
    2   10 -10.5
    3   15   6.5
    4   18   6.5
    5   19   6.5
    6   22   6.5
    7   23   7.5
    8   25   7.5
    9   29   7.5
    10  30  -7.5
    11  33  -7.5
    12  35  -7.5
    13 100  -7.5
    

    【讨论】:

    • 谢谢。你能解释一下为什么你选择在if(!is.data.frame(tree[[1]])) tree &lt;- tree[c(2, 1, 3)]这里重新排序列表吗?
    • @YQW 根据您给出的示例,对于每个分支,都有一个终止。
    • 感谢您的回复。我用mytree2 编辑了我的原始帖子。我尝试在其上使用pred_fun,但输出看起来不正确。你能看一下吗?
    • 谢谢。我再次使用名为mytree3 的不同树更新了我的帖子,在其上运行pred_tree 并没有给我正确的输出。你能看一下吗?很抱歉给您带来麻烦..
    • @YQW 你会注意到你已经开始偏离原来的问题。您应该考虑提出一个新问题。好吧,我使用的逻辑不是来自列表,而是来自图像。提供对树图像的编辑
    猜你喜欢
    • 2023-03-05
    • 2016-04-19
    • 2014-03-29
    • 2011-02-15
    • 2020-08-20
    • 1970-01-01
    • 1970-01-01
    • 2013-07-20
    • 2016-07-24
    相关资源
    最近更新 更多