【问题标题】:Aggregating values on a data tree with R用 R 聚合数据树上的值
【发布时间】:2017-07-20 21:41:11
【问题描述】:

我正在尝试从数据树结构中计算小时数。我可以直接在父节点下添加小时数,但不能包括分配给树中父节点的小时数。任何建议都会很棒。

这是我得到的:

levelName hours totalhours 1 Ned NA 1 2 °--John 1 3 3 °--Kate 1 3 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

这就是我要找的:

levelName hours totalHours 1 Ned NA 5 2 °--John 1 5 3 °--Kate 1 4 4 ¦--Dan 1 1 5 ¦--Ron 1 1 6 °--Sienna 1 1

这是我的代码:

# Install package
install.packages('data.tree')
library(data.tree)

# Create data frame
to <- c("Ned", "John", "Kate", "Kate", "Kate")
from <- c("John", "Kate", "Dan", "Ron", "Sienna")
hours <- c(1,1,1,1,1)
df <- data.frame(from,to,hours)

# Create data tree
tree <- FromDataFrameNetwork(df)
print(tree, "hours")

# Get running total of hours that includes all nodes and children values.
tree$Do(function(x) x$total <- Aggregate(x, "hours", sum), traversal = "post-order")
print(tree, "hours", runningtotal = tree$Get(Aggregate, "total", sum))

【问题讨论】:

    标签: r tree nodes aggregate


    【解决方案1】:

    您可以简单地使用递归函数:

    myApply <- function(node) {
      node$totalHours <- 
        sum(c(node$hours, purrr::map_dbl(node$children, myApply)), na.rm = TRUE)
    }
    myApply(tree)
    print(tree, "hours", "totalHours")
    

    结果:

               levelName hours totalHours
    1 Ned                   NA          5
    2  °--John               1          5
    3      °--Kate           1          4
    4          ¦--Dan        1          1
    5          ¦--Ron        1          1
    6          °--Sienna     1          1
    

    编辑:填充两个元素:

    # Create data frame
    to <- c("Ned", "John", "Kate", "Kate", "Kate")
    from <- c("John", "Kate", "Dan", "Ron", "Sienna")
    hours <- c(1,1,1,1,1)
    hours2 <- 5:1
    df <- data.frame(from,to,hours, hours2)
    
    # Create data tree
    tree <- FromDataFrameNetwork(df)
    print(tree, "hours", "hours2")
    
    myApply <- function(node) {
      res.ch <- purrr::map(node$children, myApply)
      a <- node$totalHours <- 
        sum(c(node$hours,  purrr::map_dbl(res.ch, 1)), na.rm = TRUE)
      b <- node$totalHours2 <- 
        sum(c(node$hours2, purrr::map_dbl(res.ch, 2)), na.rm = TRUE)
      list(a, b)
    }
    myApply(tree)
    print(tree, "hours", "totalHours", "hours2", "totalHours2")
    

    结果:

               levelName hours totalHours hours2 totalHours2
    1 Ned                   NA          5     NA          15
    2  °--John               1          5      5          15
    3      °--Kate           1          4      4          10
    4          ¦--Dan        1          1      3           3
    5          ¦--Ron        1          1      2           2
    6          °--Sienna     1          1      1           1
    

    【讨论】:

    • 这很酷(而且更通用)。我有一个问题。如果我们有多个包含数字数据的列,并且想要创建具有聚合数据的相应列,我们是否必须为每一列创建一个“应用”函数(这就是我所做的),或者可以使用创建所有列只有一个递归函数(我没有成功)?
    • @Brani 我认为您可以在函数中填充许多变量并返回一个包含所有变量的列表,并且可能使用map2pmap 而不是map。你有一个例子吗?
    • 在 df 中添加另一个变量(如“小时”,但数字不同)并使用相同的示例就足够了。
    【解决方案2】:

    Do 期间的 Aggregate 值缓存似乎仅适用于同一字段:

    tree$Do(function(node) node$totalHours = node$hours)
    
    tree$Do(function(node) node$totalHours = sum(if(!node$isLeaf) node$totalHours else 0,
                                                 Aggregate(node, "totalHours", sum)),
            traversal = "post-order")
    print(tree, "hours", "totalHours")
    #           levelName hours totalHours
    #1 Ned                   NA          5
    #2  °--John               1          5
    #3      °--Kate           1          4
    #4          ¦--Dan        1          1
    #5          ¦--Ron        1          1
    #6          °--Sienna     1          1
    

    【讨论】:

      【解决方案3】:

      如果你想递归地总结孩子,data.tree 包的 Aggregate 函数特别有用。在你的情况下,你想做两件事:

      1. 总结孩子加自己的价值
      2. 将总和存储在单独的变量中

      一种方法是:

      library(data.tree)
      
      # Create data frame
      to <- c("Ned", "John", "Kate", "Kate", "Kate")
      from <- c("John", "Kate", "Dan", "Ron", "Sienna")
      hours <- c(1,1,1,1,1)
      df <- data.frame(from,to,hours)
      
      # Create data tree
      tree <- FromDataFrameNetwork(df)
      print(tree, "hours")
      
      # Get running total of hours that includes all nodes and children values.
      tree$Do(function(x) x$total <- ifelse(is.null(x$hours), 0, x$hours) + sum(Get(x$children, "total")), traversal = "post-order")
      print(tree, "hours", "total")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-06-28
        • 2016-01-13
        • 2014-04-23
        相关资源
        最近更新 更多