【问题标题】:R data cube define hierarchyR数据立方体定义层次结构
【发布时间】:2018-10-15 11:47:14
【问题描述】:

我对 OLapCube 包 data.cube 有一些问题:

install.packages("data.cube", repos = paste0("https://", c(
    "jangorecki.gitlab.io/data.cube",
    "cloud.r-project.org"
)))

一些测试数据:

 library(data.table)
 set.seed(42)

 dt <- CJ(color = c("green","yellow","red"),
            year = 2011:2015,
            month = 1:12,
            status = c("active","inactive","archived","removed")
 )[sample(600)]

 dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]

现在我想创建一个多维数据集并在时间维度上应用层次结构。像这样的:

library(data.cube)
dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"), 
                   measure.vars = "value", 
                   hierarchies = list(time <- list("year, month")))

如果我运行此代码,我会收到错误:

Error in as.data.cube.data.table(dt, id.vars = c("color", "year", "month",  : 
  identical(names(hierarchies), id.vars) | identical(names(hierarchies),  .... is not TRUE

如果我尝试类似

hierarchies = list(time <- list("year, month"), color <- list("color"), 
                  status <- list("status"))

我得到同样的错误。

【问题讨论】:

    标签: r data.table olap-cube data.cube


    【解决方案1】:

    写得很好。
    我看到您根据?as.data.cube 示例制作了示例,所以我也会尝试使用这些示例来回答您的问题

    # Original example goes as follows
    library(data.cube)
    library(data.table)
    set.seed(1L)
    dt = CJ(color = c("green","yellow","red"),
            year = 2011:2015,
            status = c("active","inactive","archived","removed"))[sample(30)]
    dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]
    
    dc = as.data.cube(
      x = dt, id.vars = c("color","year","status"),
      measure.vars = "value",
      hierarchies = sapply(c("color","year","status"),
                           function(x) list(setNames(list(character()), x)),
                           simplify=FALSE)
    )
    str(dc)
    

    在检查层次结构的有效性时似乎出现了您的错误。
    不幸的是,这不是很有意义的错误,我创建了问题#18,所以这一天会得到改善。
    因此,让我们比较手动的层次结构和在您的示例中创建的层次结构。

    sapply(c("color","year","status"),
           function(x) list(setNames(list(character()), x)),
           simplify=FALSE) -> h
    str(h)
    #List of 3
    # $ color :List of 1
    #  ..$ :List of 1
    #  .. ..$ color: chr(0) 
    # $ year  :List of 1
    #  ..$ :List of 1
    #  .. ..$ year: chr(0) 
    # $ status:List of 1
    #  ..$ :List of 1
    #  .. ..$ status: chr(0)     
    
    hierarchies = list(time <- list("year, month"), color <- list("color"), 
                       status <- list("status"))
    str(hierarchies)
    #List of 3
    # $ :List of 1
    #  ..$ : chr "year, month"
    # $ :List of 1
    #  ..$ : chr "color"
    # $ :List of 1
    #  ..$ : chr "status"
    

    我们可以看到手册中的层次结构是一个命名元素的列表,而您的示例是一个未命名元素的列表。
    我相信你误用了&lt;-,而应该使用=&lt;- 并不总是等于 = 运算符。您可以在3.1.3.1 Assignment &lt;- vs = 中阅读更多关于这种情况的信息。

    让我们看看修复是否足够

    hierarchies = list(time = list(c("year, month")), color = list("color"), 
                       status = list("status"))
    
    dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"), 
                       measure.vars = "value", 
                       hierarchies = hierarchies)
    

    我们仍然遇到同样的错误,因此需要名称,而不是问题的根本原因。仔细观察后,我现在看到您想要构建没有主键的 时间 维度。
    请注意,您不能将多个列名作为单个字符串传递,因此

    "year, month"
    

    应该写成

    c("year","month")
    

    我们仍然需要 time 维度的主键作为单个字段,yearmonth 将只是属性。
    因此,让我们为 time 维度创建主键,因为我们的时间维度具有年月粒度,我们将在该粒度上创建键。

    library(data.table)
    set.seed(42)
    
    dt <- CJ(color = c("green","yellow","red"),
             year = 2011:2015,
             month = 1:12,
             status = c("active","inactive","archived","removed")
    )[sample(600)
      ][, yearmonth:=sprintf("%04d%02d", year, month) # this ensure four numbers for year and 2 numbers for month
        ]
    
    dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]
    

    现在让我们做层次结构,注意year 已更改为yearmonth。 在下面的层次结构中,值向量c("year","month") 表示这些属性依赖于yearmonth。请参阅?as.data.cube 中的更多示例,了解更复杂的层次结构案例。

    hierarchies = list(
      color = list(color = list(color = character())),
      yearmonth = list(yearmonth = list(yearmonth = c("year","month"))),
      status = list(status = list(status = character()))
    )
    
    dc = as.data.cube(
      x = dt, id.vars = c("color","yearmonth","status"),
      measure.vars = "value",
      hierarchies = hierarchies
    )
    str(dc)
    

    我们的data.cube 已成功创建。让我们尝试使用yearmonth的键来查询它

    dc[, .(yearmonth=201105L)] -> d
    as.data.table(d)
    dc[, .(yearmonth=201105L), drop=FALSE] -> d
    as.data.table(d)
    

    现在尝试使用维度、年份和月份的属性来查询它

    dc[, .(year=2011L)] -> d
    as.data.table(d) # note that dimension is not being dropped because it still have more than 1 value
    dc[, .(month=5L)] -> d
    as.data.table(d)
    dc[, .(year=2011L, month=5L)] -> d
    as.data.table(d) # here dimension has been dropped because there was only single element in that dimension, you can of course use `drop=FALSE` if needed.
    

    希望对你有帮助,祝你好运!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-04-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多