【发布时间】:2021-05-02 16:56:35
【问题描述】:
我搜索了很长时间,以获得距离与相关双图的直接解释,以及如何转换 PCA 的标准输出以实现两个双图的解释。我看到的所有堆栈溢出解释 1 2 3 4 数学术语让我大吃一惊。如何使用 R 的 prcomp 的输出创建距离双图和相关双图?
【问题讨论】:
-
糟糕,我知道我看到了自我回答,但不知道这样做的预期方式。我已相应调整,谢谢!
我搜索了很长时间,以获得距离与相关双图的直接解释,以及如何转换 PCA 的标准输出以实现两个双图的解释。我看到的所有堆栈溢出解释 1 2 3 4 数学术语让我大吃一惊。如何使用 R 的 prcomp 的输出创建距离双图和相关双图?
【问题讨论】:
我发现最好的解释是来自蒙特利尔大学生物科学系的 Pierre Legendre (http://biol09.biol.umontreal.ca/PLcourses/Ordination_section_1.1_PCA_Eng.pdf) 的一些演讲幻灯片。然而,虽然这些幻灯片确实展示了手动绘制距离和相关双图的方法,但它们没有展示如何根据 prcomp 的结果绘制距离和相关双图。
所以我完成了一个示例,该示例显示了如何使用 prcomp 的输出使其与上面 pdf 中的示例等效。我将这里留给像我这样想知道如何绘制距离与相关双图以及何时使用它们的未来人(根据 Pierre Legendre)
set.seed(1)
#Run standard PCA
pca_res <- prcomp(mtcars[, 1:7], center = TRUE, scale = TRUE, retx = TRUE)
#To print a distance biplot, simply plot pca_red$x as points and $rotation
#as vectors
library(ggplot2)
arrow_len <- 3 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x), aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation),
aes(x = 0, y = 0, yend = arrow_len*PC1, xend = arrow_len*PC2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation),
mapping = aes(y = arrow_len*PC1, x = arrow_len*PC2,
label = row.names(pca_res$rotation)))
#This is equivalent to the following steps:
Y_centered <- scale(mtcars[, 1:7], center = TRUE, scale = TRUE)
Y_eig <- eigen(cov(Y_centered))
#Note that Y_eig$vectors == pca_res$rotation ("rotations" or "loadings")
# and Y_eig$values (eigenvalues) == pca_res$sdev**2
#For a distance biplot
U_frame <- Y_eig$vectors
#F is your PC scores, achieved by multiplying your original data by the rotations
F_frame <- Y_centered %*% U_frame
#flipping constants if needed bc PC axis direction is arbitrary
x_flip = -1
y_flip = -1
ggplot(data = as.data.frame(F_frame), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
#To print a correlation biplot, matrix multiply your rotations/loadings
# by the identity matrix times your PCA standard deviations
# (equivalent to the sqrt of your eigen values)
U_frame_scaling2 <- U_frame %*% diag(Y_eig$values^(0.5))
#And divide your PC scores by your PCA standard deviations
# (equivalent to 1/sqrt(eigen values)
F_frame_scaling2 <- F_frame %*% diag(Y_eig$values^(-0.5))
#Plot
arrow_len <- 1.5 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x %*% diag(1/pca_res$sdev)),
aes(x = V1, y = V2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
aes(x = 0, y = 0, yend = arrow_len*V1, xend = arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
mapping = aes(y = arrow_len*V1, x = arrow_len*V2,
label = row.names(pca_res$rotation)))
ggplot(data = as.data.frame(F_frame_scaling2), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame_scaling2),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame_scaling2),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
至于两者之间的区别(以防上面的pdf在某些时候不可用):
缩放类型 1:距离双标图,当兴趣在 物体相对于彼此的位置。 ——
缩放类型2:相关双标图,当角度 变量之间的关系是最重要的。 ——
在缩放 1(距离双标图)中,
在缩放 2(相关双标图)中,
在缩放 1(距离双标图)中,
在缩放 2(相关双标图)中,
【讨论】: