数据集来源:https://www.kaggle.com/psparks/instacart-market-basket-analysis
思路:
实例代码:
import pandas as pd from sklearn.decomposition import PCA def main(): \'\'\' 降维实例:主成分分析 :return: None \'\'\' # 读取数据 prior = pd.read_csv("order_products__prior.csv") products = pd.read_csv("products.csv") orders = pd.read_csv("orders.csv") aisles = pd.read_csv("aisles.csv") # 合并数据 _mg = pd.merge(prior, products, on=[\'product_id\', \'product_id\']) _mg = pd.merge(_mg, orders, on=[\'order_id\', \'order_id\']) mt = pd.merge(_mg, aisles, on=[\'aisle_id\', \'aisle_id\']) # print(mt.head(10)) # 交叉表 cross = pd.crosstab(mt[\'user_id\'], mt[\'aisle\']) # print(cross) pca = PCA(n_components=0.9) data = pca.fit_transform(cross) print(data) print(data.shape) return None if __name__ == \'__main__\': main()
运行结果:
从结果中可以看出数据的维数降到了27