【发布时间】:2020-10-14 04:09:54
【问题描述】:
我正在尝试计算 PCA 的 R2 和 Q2 分数,但我很难从头开始计算 r2_scores 的矩阵。对于R2,如果我从sklearn 使用r2_score,我会得到预期的结果(针对SIMCA 中的结果进行测试),但我不能对Q2 分数使用相同的方法,所以我想帮助我了解如何在R2/Q2 (1 - num/SST) 的第二项中找到分子。请原谅我的新手代码,非常感谢您提供的任何帮助!
cov_matrix = X.cov().values
total_variation = cov_matrix.trace()
q2 = []
r2 = []
for pc in range(1, min(X.shape[1], 11)):
### Create folds
kf = KFold(n_splits=7, shuffle=True)
### Fit PCA model on data
pca = PCA(pc)
scores = pca.fit_transform(X)
### Reconstruct X dataset from scores
recon = pca.inverse_transform(scores)
### Create residual matrix
res = X - recon
### Using sklearn, the correct R2 score is returned
#r2_i = r2_score(X, recon)
### Incorrect implementation of R2
r2_i = 1 - np.linalg.norm(res)/total_variation
### Append current result to list of R2 scores for each principle component count
r2.append(r2_i)
mat_test = np.zeros(shape=X.shape)
for train_index, test_index in kf.split(X) :
### split data into train and test sets
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
### Fit PCA model on training data
pca_cv = PCA(pc)
pca_cv.fit(X_train)
### Calculate scores of test set and reconstruct the data
scores_test = pca_cv.transform(X_test)
recon_test = pca_cv.inverse_transform(scores_test)
### Save the reconstructed data
mat_test[test_index, :] = recon_test
### Create residual matrix
res = X - mat_test
### Calculate PRESS (incorrect)
press = np.linalg.norm(res)
### calculate Q2 score (doesn't give correct values)
#q2_i = r2_score(X, mat_test)
### calculate Q2 score (doesn't give correct values)
q2.append(1 - press/total_variation)
【问题讨论】:
标签: python matrix pca prediction cross-validation