如何使用 PCA 模型预测 Stata 中新数据的分数？答案

【问题标题】：How to use PCA model to predict scores on new data in Stata?如何使用 PCA 模型预测 Stata 中新数据的分数？
【发布时间】：2023-04-10 16:40:01
【问题描述】：

我的问题类似于 R: using predict() on new data with high dimensionality 但针对 Stata

我想对一个数据子集（实验中的对照组）运行主成分模型 (pca) 以提取第一个成分。然后我想在单独的数据子集（实验中的治疗组）上重新运行 PCA 模型，并获得这些数据的分数。本质上，我想使用在 dataset_1 上运行的 pca 模型来预测新 dataset_2 中的分数。

在 R 中，仅将模型拟合到对照组，然后对拟合模型使用“预测”命令，并在“新数据”参数中使用完整的数据集。这将为仅适用于对照组的模型的所有观察结果生成预测。但是，如何在 Stata 中做到这一点？

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a
screeplot, yline(1)     
rotate, clear       
pca $xlist2a, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

基于尼克的回答的固定代码：

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a if zgroupa10==1 
screeplot, yline(1)     
rotate, clear       
pca $xlist2a if zgroupa10==1, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

【问题讨论】：

这里的好问题展示了一些代码尝试。
谢谢，我已经编辑了帖子以包含代码。
感谢您添加代码，但上面的代码在所有某些变量的观察结果上运行 pca，然后在所有观察结果上运行 predict。这不是您应该做的，但您在我的回答下方的评论暗示您的真实代码应用了所需的方法。
谢谢，我已根据您的回答添加了修改后的代码。

标签： stata pca predict

【解决方案1】：

您尝试了什么代码？最简单的实验表明，同样的方法也适用于 Stata：

. sysuse auto, clear
(1978 Automobile Data)

. pca headroom trunk length displacement if foreign

Principal components/correlation                 Number of obs    =         22
                                                 Number of comp.  =          4
                                                 Trace            =          4
    Rotation: (unrotated = principal)            Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.93666      .656823             0.4842       0.4842
           Comp2 |      1.27983      .615381             0.3200       0.8041
           Comp3 |      .664453      .545396             0.1661       0.9702
           Comp4 |      .119057            .             0.0298       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
    -------------+----------------------------------------+-------------
        headroom |   0.0288    0.7373    0.6749    0.0083 |           0 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 |           0 
          length |   0.6849   -0.1313    0.1229   -0.7061 |           0 
    displacement |   0.6858   -0.1313    0.1054    0.7080 |           0 
    --------------------------------------------------------------------

. predict score1 score2 if !foreign
(score assumed)
(2 components skipped)

Scoring coefficients 
    sum of squares(column-loading) = 1

    ------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 
    -------------+----------------------------------------
        headroom |   0.0288    0.7373    0.6749    0.0083 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 
          length |   0.6849   -0.1313    0.1229   -0.7061 
    displacement |   0.6858   -0.1313    0.1054    0.7080 
    ------------------------------------------------------

。

【讨论】：

在我没有插入zgroup10==1之前，你帮我回答了我的问题，我插入了zgroup10==1，它起作用了。感谢您的耐心等待。