【问题标题】:Why do I need to indicate the number of components to be kept in Principal Component Analysis?为什么我需要在主成分分析中指明要保留的成分数量?
【发布时间】:2021-05-18 05:49:21
【问题描述】:

我发现要使用 pca,有必要在开始时指出要保留的组件数量,例如在下一个代码中:

# Initialize
model = pca(n_components=3, normalize=True)

有没有只表示方差并让算法给我最重要的组件?

【问题讨论】:

    标签: python scikit-learn pca


    【解决方案1】:

    您不一定需要提前指定组件的数量。您可以提取所有组件并仅保留解释累积方差的给定部分的组件。请参阅下面的代码以获取示例。

    import numpy as np
    from sklearn.decomposition import PCA
    from sklearn.datasets import make_spd_matrix
    from sklearn.preprocessing import StandardScaler
    
    # generate the data
    np.random.seed(100)
    
    N = 1000  # number of samples
    K = 10    # number of features
    
    mean = np.zeros(K)
    cov = make_spd_matrix(K)
    X = np.random.multivariate_normal(mean, cov, N)
    print(X.shape)
    # (1000, 10)
    
    # rescale the data
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    
    # perform the PCA
    pca = PCA(n_components=None)
    pca.fit(X)
    
    # extract the smallest number of components which
    # explain at least p% (e.g. 80%) of the variance
    p = 0.80
    n_components = 1 + np.argmax(np.cumsum(pca.explained_variance_ratio_) >= p)
    print(n_components)
    # 6
    
    # extract the values of the selected components
    Z = pca.transform(X)[:, :n_components]
    print(Z.shape)
    # (1000, 6)
    

    【讨论】:

      猜你喜欢
      • 2015-10-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-07-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-09-05
      相关资源
      最近更新 更多