提取线性判别方程答案

【问题标题】：Extracting the Linear Discriminant Equation提取线性判别方程
【发布时间】：2015-01-16 05:16:24
【问题描述】：

所以我有这个数据，我想从它产生的方程中提取系数。这样我就可以插入一个新的数据点并查看它的放置位置。

library(MASS)
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
               Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
##  c  s  v
## 22 23 30
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)

我知道我可以得到这个：

> z
Call:
lda(Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)

Prior probabilities of groups:
    c         s         v 
0.3333333 0.3333333 0.3333333 

Group means:
  Sepal.L. Sepal.W. Petal.L.  Petal.W.
c 5.969231 2.753846 4.311538 1.3384615
s 5.075000 3.541667 1.500000 0.2583333
v 6.700000 2.936000 5.552000 1.9880000

Coefficients of linear discriminants:
                LD1        LD2
Sepal.L. -0.5458866  0.5215937
Sepal.W. -1.5312824  1.7891248
Petal.L.  1.8087255 -1.2637188
Petal.W.  2.8620894  3.2868849

Proportion of trace:
   LD1    LD2 
0.9893 0.0107

但是有没有办法只得到方程，这样我就不必手动计算新的观察值了？

【问题讨论】：

标签： r statistics

【解决方案1】：

只是把它变成一个答案。您需要predict()，MASS 包中的predict.lda 方法在其帮助页面中有您的确切示例：

tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
predict(z, test)$class

【讨论】：

【解决方案2】：

默认方法是“插件”，所以这是来自MASS:::predict.lda 的代码。 object 是拟合对象，x 来自转换为矩阵的 newdata 参数：

# snipped preamble and error checking
means <- colSums(prior * object$means)
scaling <- object$scaling
x <- scale(x, center = means, scale = FALSE) %*% scaling
dm <- scale(object$means, center = means, scale = FALSE) %*% 
    scaling
method <- match.arg(method)
dimen <- if (missing(dimen)) 
    length(object$svd)
else min(dimen, length(object$svd))
N <- object$N
if (method == "plug-in") {
    dm <- dm[, 1L:dimen, drop = FALSE]
    dist <- matrix(0.5 * rowSums(dm^2) - log(prior), nrow(x), 
        length(prior), byrow = TRUE) - x[, 1L:dimen, drop = FALSE] %*% 
        t(dm)
    dist <- exp(-(dist - apply(dist, 1L, min, na.rm = TRUE)))
}
@ snipped two other methods

}
posterior <- dist/drop(dist %*% rep(1, ng))

这主要是为了说明为什么 Gregor 的答案是最明智的方法。试图拉出一个“方程式”似乎是徒劳的。（我记得在我研究生的第一年回归班上使用线性回归的结果做这样的练习。）

【讨论】：