【问题标题】:homals package for Nonlinear PCA in R: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extentR 中非线性 PCA 的 homals 包:dimnames(x) <- dn 中的错误:“dimnames”[1] 的长度不等于数组范围
【发布时间】:2021-06-05 09:47:40
【问题描述】:

我正在尝试使用 R 中的 homals 包在数据集上实现 NLPCA(非线性 PCA),但我不断收到以下错误消息:

Error in dimnames(x) &lt;- dn : length of 'dimnames' [1] not equal to array extent

我使用的数据集可以在 UCI ML Repository 中找到,在R 中导入时称为dathttps://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29

这是我的代码(一些代码在数据集下载后提供):

nlpcasouthgerman <- homals(dat, rank=1, level=c('nominal','numerical',rep('nominal',2),
                                                'numerical','nominal',
                                                rep('ordinal',2), rep('nominal',2),
                                                'ordinal','nominal','numerical',
                                                rep('nominal',2), 'ordinal',
                                                'nominal','ordinal',rep('nominal',3)),
                           active=c(FALSE, rep(TRUE, 20)), ndim=3, verbose=1)

我试图预测第一个属性,因此我将其设置为active=FALSE。 输出如下所示(跳过所有迭代消息):

Iteration:   1 Loss Value:  0.000047 
Iteration:   2 Loss Value:  0.000044 
...
Iteration:  37 Loss Value:  0.000043 
Iteration:  38 Loss Value:  0.000043 
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

我不明白为什么会出现这个错误。我在其他一些数据集上使用了相同的代码,它运行良好,所以我不明白为什么这个错误仍然存​​在。关于可能出现的问题以及如何解决此问题的任何建议?

谢谢!

【问题讨论】:

    标签: r pca nonlinear-optimization


    【解决方案1】:

    似乎错误来自homals 函数中生成 NA 的代码,特别是针对number_credits 级别的数据,这会导致sort(as.numeric((rownames(clist[[i]])))) 出现问题并尝试捕获错误,因为其中一个级别不给出 NA 值。

    因此,要么您必须修改 homals 函数来处理这种极端情况,要么更改有问题的因素水平。这可能需要作为错误报告提交给包维护者。

    作为您的解决方法,您可以执行以下操作:

    levels(dat$number_credits)[1] <- "_1"
    

    并且该功能应该可以正常运行。

    编辑:

    我认为一种解决方案是更改homals 函数中的一行代码,但不能保证这会按预期工作。最好向包作者/维护者提交错误报告 - 请参阅 https://cran.r-project.org/web/packages/homals/ 获取地址。

    使用 rnames &lt;- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))] 而不是 rnames &lt;- sort(as.numeric((rownames(clist[[i]])))) 将允许以下代码识别 NA,但我不确定作者为什么不尝试完全保留因子水平。 无论如何,您可以在本地环境中运行修改后的函数,这需要显式调用内部(未导出)homals 函数,如下所示。不一定是最好的方法,但会在紧要关头帮助您。

    homals <- function (data, ndim = 2, rank = ndim, level = "nominal", sets = 0, 
        active = TRUE, eps = 0.000001, itermax = 1000, verbose = 0) {
        dframe <- data
        name <- deparse(substitute(dframe))
        nobj <- nrow(dframe)
        nvar <- ncol(dframe)
        vname <- names(dframe)
        rname <- rownames(dframe)
        for (j in 1:nvar) {
            dframe[, j] <- as.factor(dframe[, j])
            levfreq <- table(dframe[, j])
            if (any(levfreq == 0)) {
                newlev <- levels(dframe[, j])[-which(levfreq == 0)]
            }
            else {
                newlev <- levels(dframe[, j])
            }
            dframe[, j] <- factor(dframe[, j], levels = sort(newlev))
        }
        varcheck <- apply(dframe, 2, function(tl) length(table(tl)))
        if (any(varcheck == 1)) 
            stop("Variable with only 1 value detected! Can't proceed with estimation!")
        active <- homals:::checkPars(active, nvar)
        rank <- homals:::checkPars(rank, nvar)
        level <- homals:::checkPars(level, nvar)
        if (length(sets) == 1) 
            sets <- lapply(1:nvar, "c")
        if (!all(sort(unlist(sets)) == (1:nvar))) {
            print(cat("sets union", sort(unlist(sets)), "\n"))
            stop("inappropriate set structure !")
        }
        nset <- length(sets)
        mis <- rep(0, nobj)
        for (l in 1:nset) {
            lset <- sets[[l]]
            if (all(!active[lset])) 
                (next)()
            jset <- lset[which(active[lset])]
            for (i in 1:nobj) {
                if (any(is.na(dframe[i, jset]))) 
                    dframe[i, jset] <- NA
                else mis[i] <- mis[i] + 1
            }
        }
        for (j in 1:nvar) {
            k <- length(levels(dframe[, j]))
            if (rank[j] > min(ndim, k - 1)) 
                rank[j] <- min(ndim, k - 1)
        }
        x <- cbind(homals:::orthogonalPolynomials(mis, 1:nobj, ndim))
        x <- homals:::normX(homals:::centerX(x, mis), mis)$q
        y <- lapply(1:nvar, function(j) homals:::computeY(dframe[, j], x))
        sold <- homals:::totalLoss(dframe, x, y, active, rank, level, sets)
        iter <- pops <- 0
        repeat {
            iter <- iter + 1
            y <- homals:::updateY(dframe, x, y, active, rank, level, sets, 
                verbose = verbose)
            smid <- homals:::totalLoss(dframe, x, y, active, rank, level, 
                sets)/(nobj * nvar * ndim)
            ssum <- homals:::totalSum(dframe, x, y, active, rank, level, sets)
            qv <- homals:::normX(homals:::centerX((1/mis) * ssum, mis), mis)
            z <- qv$q
            snew <- homals:::totalLoss(dframe, z, y, active, rank, level, 
                sets)/(nobj * nvar * ndim)
            if (verbose > 0) 
                cat("Iteration:", formatC(iter, digits = 3, width = 3), 
                    "Loss Value: ", formatC(c(smid), digits = 6, 
                      width = 6, format = "f"), "\n")
            r <- abs(qv$r)/2
            ops <- sum(r)
            aps <- sum(La.svd(crossprod(x, mis * z), 0, 0)$d)/ndim
            if (iter == itermax) {
                stop("maximum number of iterations reached")
            }
            if (smid > sold) {
                warning(cat("Loss function increases in iteration ", 
                    iter, "\n"))
            }
            if ((ops - pops) < eps) 
                break
            else {
                x <- z
                pops <- ops
                sold <- smid
            }
        }
        ylist <- alist <- clist <- ulist <- NULL
        for (j in 1:nvar) {
            gg <- dframe[, j]
            c <- homals:::computeY(gg, z)
            d <- as.vector(table(gg))
            lst <- homals:::restrictY(d, c, rank[j], level[j])
            y <- lst$y
            a <- lst$a
            u <- lst$z
            ylist <- c(ylist, list(y))
            alist <- c(alist, list(a))
            clist <- c(clist, list(c))
            ulist <- c(ulist, list(u))
        }
        dimlab <- paste("D", 1:ndim, sep = "")
        for (i in 1:nvar) {
            if (ndim == 1) {
                ylist[[i]] <- cbind(ylist[[i]])
                ulist[[i]] <- cbind(ulist[[i]])
                clist[[i]] <- cbind(clist[[i]])
            }
            options(warn = -1)
    # Here is the line that I changed in the code:
            # rnames <- sort(as.numeric((rownames(clist[[i]]))))
            rnames <- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))]
            options(warn = 0)
            if ((any(is.na(rnames))) || (length(rnames) == 0)) 
                rnames <- rownames(clist[[i]])
            if (!is.matrix(ulist[[i]])) 
                ulist[[i]] <- as.matrix(ulist[[i]])
            rownames(ylist[[i]]) <- rownames(ulist[[i]]) <- rownames(clist[[i]]) <- rnames
            rownames(alist[[i]]) <- paste(1:dim(alist[[i]])[1])
            colnames(clist[[i]]) <- colnames(ylist[[i]]) <- colnames(alist[[i]]) <- dimlab
            colnames(ulist[[i]]) <- paste(1:dim(as.matrix(ulist[[i]]))[2])
        }
        names(ylist) <- names(ulist) <- names(clist) <- names(alist) <- colnames(dframe)
        rownames(z) <- rownames(dframe)
        colnames(z) <- dimlab
        dummymat <- as.matrix(homals:::expandFrame(dframe, zero = FALSE, clean = FALSE))
        dummymat01 <- dummymat
        dummymat[dummymat == 2] <- NA
        dummymat[dummymat == 0] <- Inf
        scoremat <- array(NA, dim = c(dim(dframe), ndim), dimnames = list(rownames(dframe), 
            colnames(dframe), paste("dim", 1:ndim, sep = "")))
        for (i in 1:ndim) {
            catscores.d1 <- do.call(rbind, ylist)[, i]
            dummy.scores <- t(t(dummymat) * catscores.d1)
            freqlist <- apply(dframe, 2, function(dtab) as.list(table(dtab)))
            cat.ind <- sequence(sapply(freqlist, length))
            scoremat[, , i] <- t(apply(dummy.scores, 1, function(ds) {
                ind.infel <- which(ds == Inf)
                ind.minfel <- which(ds == -Inf)
                ind.nan <- which(is.nan(ds))
                ind.nael <- which((is.na(ds) + (cat.ind != 1)) == 
                    2)
                ds[-c(ind.infel, ind.minfel, ind.nael, ind.nan)]
            }))
        }
        disc.mat <- apply(scoremat, 3, function(xx) {
            apply(xx, 2, function(cols) {
                (sum(cols^2, na.rm = TRUE))/nobj
            })
        })
        result <- list(datname = name, catscores = ylist, scoremat = scoremat, 
            objscores = z, cat.centroids = clist, ind.mat = dummymat01, 
            loadings = alist, low.rank = ulist, discrim = disc.mat, 
            ndim = ndim, niter = iter, level = level, eigenvalues = r, 
            loss = smid, rank.vec = rank, active = active, dframe = dframe, 
            call = match.call())
        class(result) <- "homals"
        result
    }
    

    【讨论】:

    • 非常感谢!作为参考,因为我将大量使用homals 函数,我如何才能发现有问题的因素水平是什么以及我的哪些变量?有办法吗?
    • 查看我的编辑以获得可能的解决方法,但最好向包维护者/作者提交错误报告。为了回答您的问题,我认为有问题的情况是 as.numeric((rownames(clist[[i]]))) 至少有一个非 NA 和至少一个 NA 元素,因为 sort 命令会默默地抑制 NA 情况,从而导致错误。
    • 非常感谢 - 我一定会向包维护者报告这个错误。与此同时,我尝试在原始数据上运行您的解决方案并且它有效,给出了几乎相同的结果(对象分数或特征值的差异是无穷小的)所以它似乎很好。顺便说一句,您的观察确实是正确的;我尝试将所有名义变量转换为将单词作为级别或仅整数,并且一切正常,因此问题可能与分类级别有关,包括数字和非数字标签。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-04-06
    • 2022-12-08
    • 1970-01-01
    • 2016-06-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多