Python 根据 feature_importances_ 对 NumPy 数组进行切片答案

【问题标题】：Python Slicing NumPy Array according to feature_importances_Python 根据 feature_importances_ 对 NumPy 数组进行切片
【发布时间】：2015-04-15 15:51:53
【问题描述】：

我有一组功能作为NumPy 数组。

Scikit-Learn中的RandomForestRegressor返回feature_importances_，其中有所有特征的重要性值。

我需要对 NumPy 数组进行切片，以便只删除最重要的 50 个特征，并删除其他列。

我怎样才能轻松做到这一点？

【问题讨论】：

能否提供一些代码？

标签： python arrays numpy scikit-learn

【解决方案1】：

如果我理解正确，您正在寻找的是argsort。它将索引按递增顺序返回到排序数组中。举个例子：

import numpy as np
from sklearn.ensemble import RandomForestRegressor as RFR

# Create a random number generator so this example is repeatable
rs = np.random.RandomState(seed=1234)

# create 100 fake input variables with 10 features each
X = rs.rand(100, 10)
# create 100 fake response variables
Y = rs.rand(100)

rfr = RFR(random_state=rs)
rfr.fit(X, Y)

fi = rfr.feature_importances_
# argsort the feature importances and reverse to get order of decreasing importance
indices = argsort(fi)[::-1]

indices 现在包含按特征重要性递减顺序排列的输入变量的索引。

In: print indices
[7 6 3 4 5 0 1 9 2 8]
In: print fi[indices]
[ 0.22636046  0.19925157  0.17233547  0.09245424  0.08287206  0.0800437
  0.07174068  0.05554476  0.01044851  0.00894855]

通过适当的切片将第一个n 最重要的特征保留在输入变量中：

X[:, indices[:n]] # n most important features

【讨论】：