【问题标题】:Getting the top 10 and bottom 10 features获取前 10 个和后 10 个功能
【发布时间】:2023-03-29 15:38:01
【问题描述】:

我是 python 新手,正在尝试学习如何从我使用此代码创建的列表中提取前 10 名和后 10 名:

clftest = logres1.fit(X_train,y_train)

#getting the feature's coefficient
feature_importance = clftest.coef_[0]

#creating an array to identify the highest and lowest value
sorter = np.argsort(feature_importance)

#using the shape of sorter, arrange it from lowest to highest
position = np.arange(sorter.shape[0])


featfig = plt.figure(figsize=(100,100))
featax = featfig.add_subplot(1, 1, 1)
featax.barh(position, feature_importance[sorter], align="center")
featax.set_yticks(position)
featax.set_yticklabels(np.array(X.columns)[sorter], fontsize=8)

plt.show()

如您所见,我的图表中涉及很多功能...

另外,我想知道这个是否有简写,或者这是否已经是最短的代码行了..

【问题讨论】:

  • 您到底需要什么?
  • 获取图表中的前 10 列和后 10 列。我所能做的就是从最高到最低显示它们并呈现所有列。遗憾的是,由于列数过多,无法阅读
  • @Marcus 你不能只分割positionfeature_importance[sorter] 吗?像这样position_top = position[:10]

标签: python-3.x numpy matplotlib scikit-learn jupyter-notebook


【解决方案1】:

试试这个:

clftest = logres1.fit(X_train,y_train)

#getting the feature's coefficient
feature_importance = clftest.coef_[0]

#creating an array to identify the highest and lowest value
sorter = np.argsort(feature_importance)

#add 2 rows in you code
n = 10 # this is number of features top
sorter = np.append(sorter[:n],sorter[-n:]) #this is fixed code

#using the shape of sorter, arrange it from lowest to highest
position = np.arange(sorter.shape[0])


featfig = plt.figure(figsize=(100,100))
featax = featfig.add_subplot(1, 1, 1)
featax.barh(position, feature_importance[sorter], align="center")
featax.set_yticks(position)
featax.set_yticklabels(np.array(X.columns)[sorter], fontsize=8)

plt.show()

【讨论】:

  • 这是我真正期望的......谢谢@Rudolf 分享你的知识
【解决方案2】:

假设您有以下具有特征权重的数组

coef =  array([  1.88300851e+00,   9.85092999e-02,  -5.65726689e-02,
                -6.15194157e-06,  -1.47064483e-01,  -3.80980229e-01,
                -5.74536851e-01,  -2.95280519e-01,  -2.40004639e-01,
                -3.51240376e-02,  -9.66881225e-03,   1.24471692e+00,
                 4.37321571e-02,  -9.20868564e-02,  -1.44701472e-02,
                -9.55498577e-03,  -4.33660677e-02,  -3.42427309e-02,
                -4.17388237e-02,   3.75241446e-03,   1.11771818e+00,
                -3.16367948e-01,  -9.05980063e-02,  -2.56441451e-02,
                -2.61484045e-01,  -1.22299461e+00,  -1.57351240e+00,
                -6.03878651e-01,  -7.25284179e-01,  -1.29895629e-01])

您可以按降序获取特征权重排序数组的索引:

sorter = np.argsort(-coef)
sorter
array([ 0, 11, 20,  1, 12, 19,  3, 15, 10, 14, 23, 17,  9, 18, 16,  2, 22,
       13, 29,  4,  8, 24,  7, 21,  5,  6, 27, 28, 25, 26])

那么你可以像这样获得前 10 个功能:

top_ten_arg = sorter[:10]
coef[top_ten_arg]
array([  1.88300851e+00,   1.24471692e+00,   1.11771818e+00,
         9.85092999e-02,   4.37321571e-02,   3.75241446e-03,
        -6.15194157e-06,  -9.55498577e-03,  -9.66881225e-03,
        -1.44701472e-02])

类似地获得最低的 10 个特征,如下所示:

lowest_ten_arg = sorter[-10:]
coef[lowest_ten_arg]
array([-0.24000464, -0.26148405, -0.29528052, -0.31636795, -0.38098023,
       -0.57453685, -0.60387865, -0.72528418, -1.22299461, -1.5735124 ])

请注意,这只会为您获取特征权重,以获取您只需要在 X.columns 上使用 top_ten_arglowest_ten_arg 所需的特征名称,就像您使用 sorter 所做的那样

【讨论】:

  • 我明白了,这就是切片的工作原理。谢谢你分享你的知识,我真的很感激! :)
猜你喜欢
  • 2016-01-15
  • 1970-01-01
  • 2018-02-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-02-13
  • 2013-04-27
相关资源
最近更新 更多