【问题标题】:r - hierarchical data frame from child/parent relationsr - 来自子/父关系的分层数据框
【发布时间】:2016-01-09 05:41:48
【问题描述】:

我有一个子父 data.frame,我想将其转换为包含所有级别和级别编号的完整分层列表。下面的示例数据分为三个级别,但可能更多。我可以使用什么函数来转换数据?

来源:

data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon",
  "airplane", "helicopter", "Ford", "BMW", "Airbus"), parent = c(NA, NA, NA, 
  "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))

         name   parent
1        land     <NA>
2       water     <NA>
3         air     <NA>
4         car     land
5     bicycle     land
6        boat    water
7     balloon      air
8    airplane      air
9  helicopter      air
10       Ford      car
11        BMW      car
12     Airbus airplane

目的地:

data.frame(level1 = c("land", "water", "air", "land", "land", "water", "air", 
  "air", "air", "land", "land", "air"), level2 = c(NA, NA, NA, "car", "bicylcle", 
  "boat", "balloon", "airplane", "helicopter", "car", "car", "airplane"),
  level3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "Ford", "BMW", "Airbus"), 
  level_number = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3))

   level1     level2 level3 level_number
1    land       <NA>   <NA>            1
2   water       <NA>   <NA>            1
3     air       <NA>   <NA>            1
4    land        car   <NA>            2
5    land   bicylcle   <NA>            2
6   water       boat   <NA>            2
7     air    balloon   <NA>            2
8     air   airplane   <NA>            2
9     air helicopter   <NA>            2
10   land        car   Ford            3
11   land        car    BMW            3
12    air   airplane Airbus            3

【问题讨论】:

    标签: r hierarchy hierarchical-data


    【解决方案1】:

    使用data.table,您可以执行以下操作:

    require(data.table)
    l <- list() # initialize empty list
    setDT(dat) 
    setkey(dat, parent) # setting up the data as keyed data.table
    current_lvl <- dat[is.na(parent), .(level_number = 1), keyby=.(level1 = name)]
    

    通过 not current_lvl 看起来如下(由 level1 键入)

       level1 level_number
    1:    air            1
    2:   land            1
    3:  water            1
    

    现在的诀窍是加入 datcurrent_lvl 并适当地修改结果:

      current_lvl <- current_lvl[dat][ # Join the data.tables
    !is.na(level_number)][ #exclude non-child-rows
      ,level_number := level_number + 1] # increment level_number
    setnames(current_lvl, "name", paste0("level",ind+1)) # rename column
    setkeyv(current_lvl, paste0("level",ind+1)) # set key
    

    这给了你(由 level2 键入)

       level1 level_number     level2
    1:    air            2   airplane
    2:    air            2    balloon
    3:   land            2    bicycle
    4:  water            2       boat
    5:   land            2        car
    6:    air            2 helicopter
    

    将其放在while-loop 中,如下所示:

    while(nrow(current_lvl) > 0){
      ind <- length(l) + 1
      l[[ind]] <- current_lvl
      current_lvl <- current_lvl[dat][!is.na(level_number)][,level_number := level_number + 1]
      if(nrow(current_lvl) == 0L){
        break
      }
      setnames(current_lvl, "name", paste0("level",ind+1))
      setkeyv(current_lvl, paste0("level",ind+1))
    }
    

    你可以看看 l 看看结果。通过rbindlist 组合此功能可满足您的需求

    res <- rbindlist(l, fill=TRUE)
    setcolorder(res, sort(names(res)))
    res
    

    结果是什么

    > res
        level_number level1     level2 level3
     1:            1    air         NA     NA
     2:            1   land         NA     NA
     3:            1  water         NA     NA
     4:            2    air   airplane     NA
     5:            2    air    balloon     NA
     6:            2   land    bicycle     NA
     7:            2  water       boat     NA
     8:            2   land        car     NA
     9:            2    air helicopter     NA
    10:            3    air   airplane Airbus
    11:            3   land        car    BMW
    12:            3   land        car   Ford
    

    【讨论】:

      【解决方案2】:

      使用 data.tree 包,您可以执行以下操作:

      library(data.tree)
      df <- data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon", "airplane", "helicopter", "Ford", "BMW", "Airbus"), 
                       parent = c("root", "root", "root", "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))
      

      请注意,我将 NA 替换为“root”,这使得转换为 data.tree 变得更加容易。即:

      tree <- FromDataFrameNetwork(df)
      

      然后获得所需的格式变得微不足道,因为我们可以使用 data.tree 中的层次结构:

      ToDataFrameTree(tree, 
                      level1 = function(x) x$path[2],
                      level2 = function(x) x$path[3],
                      level3 = function(x) x$path[4],
                      level_number = function(x) x$level - 1)[-1,-1]
      

      【讨论】:

        【解决方案3】:

        不要使用"root" 作为顶级记录的父值。使用 data.tree-package 的解决方案很棒,但是,在较新的版本中,"root" 是节点的保留名称。尽管它会自动替换为“root2”,但对 FromDataFrameNetwork(df) 的调用并没有返回所需的树。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2015-04-17
          • 2021-10-24
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-06-06
          • 1970-01-01
          相关资源
          最近更新 更多