【问题标题】:SMOTE algorithm initial conditionSMOTE 算法初始条件
【发布时间】:2017-05-03 12:17:35
【问题描述】:

我正在使用 python 不平衡学习包中的 SMOTE 算法:

from imblearn.over_sampling import SMOTE
sm = SMOTE(kind='regular', n_neighbors = 4)
 :
X_train_resampled, y_train_resampled = sm.fit_sample(X_train, y_train)

我已经明确设置了n_neighbors = 4。但是,我从上面的代码中得到了以下错误:


ValueError                                Traceback (most recent call last)
<ipython-input-2-9e9116d71706> in <module>()
     33 
     34     #try:
---> 35     X_train_resampled, y_train_resampled = sm.fit_sample(X_train, y_train)
     36     #except:
     37     #continue

/usr/local/lib/python3.4/dist-packages/imblearn/base.py in fit_sample(self, X, y)
    176         """
    177 
--> 178         return self.fit(X, y).sample(X, y)
    179 
    180     def _validate_ratio(self):

/usr/local/lib/python3.4/dist-packages/imblearn/base.py in sample(self, X, y)
    153             self._validate_ratio()
    154 
--> 155         return self._sample(X, y)
    156 
    157     def fit_sample(self, X, y):

/usr/local/lib/python3.4/dist-packages/imblearn/over_sampling/smote.py in _sample(self, X, y)
    287             nns = self.nearest_neighbour.kneighbors(
    288                 X_min,
--> 289                 return_distance=False)[:, 1:]
    290 
    291             self.logger.debug('Create synthetic samples ...')

/usr/local/lib/python3.4/dist-packages/sklearn/neighbors/base.py in kneighbors(self, X, n_neighbors, return_distance)
    341                 "Expected n_neighbors <= n_samples, "
    342                 " but n_samples = %d, n_neighbors = %d" %
--> 343                 (train_size, n_neighbors)
    344             )
    345         n_samples, _ = X.shape

ValueError: Expected n_neighbors <= n_samples,  but n_samples = 5, n_neighbors = 6

知道为什么我的n_neighbors = 4 设置不起作用吗?

【问题讨论】:

  • 错误信息为n_neighbors=6
  • 我明白了,因为它说的是 6,所以我把它改成了 4(最初没有设置任何值)。为什么我手动设置为4后还是6?谢谢
  • 也许你没有重新运行单元格?

标签: python machine-learning classification imblearn


【解决方案1】:

正确的参数是:

k_neighbors:整数或对象,可选(默认=5)

如果是 int,则用于构建合成样本的最近邻居数。如果是对象,则从 sklearn.neighbors.base.KNeighborsMixin 继承的估计器将用于查找 k_neighbors。

您用 n 通知 n_neighbors,但正确的是 k_neighbors,用 k!

该消息是因为 5默认值

阅读文档here

【讨论】:

    猜你喜欢
    • 2018-08-29
    • 2011-02-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多