【发布时间】:2021-12-31 01:53:08
【问题描述】:
我想创建一个大的键值对查找表,尝试如下:
# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)
make_dict <- function(keys, values){
require(rlang)
e <- new.env(size = length(keys))
l <- list2(!!!setNames(values, keys))
list2env(l, envir = e, hash = T) # problem in here...?
}
d <- make_dict(key, val)
问题
当make_dict 运行时,它会抛出Error: protect(): protection stack overflow。特别是在 RStudio 中,当输入是一个长度大于 49991 的向量时,这似乎与 this stackoverflow post 非常相似。
但是,当我运行访问器函数来获取一些值时,make_dict 似乎运行良好,因为我在其结果中找不到任何奇怪之处:
`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
vector("integer", length(..2)) |>
(\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys
anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE
在 RGui 中运行相同的代码不会引发错误。
奇事
-
d环境对象不会出现在 RStudio 的环境窗格中,其大小 > 5e4。 - R 控制台迅速返回 >(表示函数已完成),但在抛出错误之前无响应
- 如果manually setting
options(expressions = 5e5)或保留默认值 5000,则会引发错误 - 抛出错误的时间与输入向量的大小成正比
-
tryCatch(make_dict(key, val), error = function(e) e)没有发现错误 - 如果从包中运行代码也会出现该错误(打包版本可通过
remotes::install_github("D-Se/minimal")获得)
问题
这里发生了什么?如何解决此类错误?
options(error = traceback) 建议 here 没有给出任何结果。在make_dict 函数中的list2env 之后插入browser() 会在浏览器打开很久后引发错误。一个traceback()给出了函数.rs.describeObject,用于generate the summary in the Environment pane,可以找到here。
traceback()
# .rs.describeObject
(function (env, objName, computeSize = TRUE)
{
obj <- get(objName, env)
hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
if (hasNullPtr) {
val <- "<Object with null pointer>"
desc <- "An R object containing a null external pointer"
size <- 0
len <- 0
}
else {
val <- "(unknown)"
desc <- ""
size <- if (computeSize)
object.size(obj)
else 0
len <- length(obj)
}
class <- .rs.getSingleClass(obj)
contents <- list()
contents_deferred <- FALSE
if (is.language(obj) || is.symbol(obj)) {
val <- deparse(obj)
}
else if (!hasNullPtr) {
if (size > 524288) {
len_desc <- if (len > 1)
paste(len, " elements, ", sep = "")
else ""
if (is.data.frame(obj)) {
val <- "NO_VALUE"
desc <- .rs.valueDescription(obj)
}
else {
val <- paste("Large ", class, " (", len_desc,
format(size, units = "auto", standard = "SI"),
")", sep = "")
}
contents_deferred <- TRUE
}
else {
val <- .rs.valueAsString(obj)
desc <- .rs.valueDescription(obj)
if (class == "data.table" || class == "ore.frame" ||
class == "cast_df" || class == "xts" || class ==
"DataFrame" || is.list(obj) || is.data.frame(obj) ||
isS4(obj)) {
if (computeSize) {
contents <- .rs.valueContents(obj)
}
else {
val <- "NO_VALUE"
contents_deferred <- TRUE
}
}
}
}
list(name = .rs.scalar(objName), type = .rs.scalar(class),
clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)),
value = .rs.scalar(val), description = .rs.scalar(desc),
size = .rs.scalar(size), length = .rs.scalar(len), contents = contents,
contents_deferred = .rs.scalar(contents_deferred))
})(<environment>, "d", TRUE)
【问题讨论】:
标签: r dictionary error-handling rstudio