很遗憾,我找不到 SelectKBest 的这个功能。
但是我们可以轻松做的是将SelectKBest 扩展为我们的自定义类,以覆盖将被调用的fit() 方法。
这是 SelectKBest 的当前fit() 方法(取自source at github)
# No provision for extra parameters here
def fit(self, X, y):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
....
....
# Here only the X, y are passed to scoring function
score_func_ret = self.score_func(X, y)
....
....
self.scores_ = np.asarray(self.scores_)
return self
现在我们将使用更改后的fit() 定义我们的新类SelectKBestCustom。我已经从上述来源复制了所有内容,只更改了两行(对此进行了评论):
from sklearn.utils import check_X_y
class SelectKBestCustom(SelectKBest):
# Changed here
def fit(self, X, y, discrete_features='auto'):
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=True)
if not callable(self.score_func):
raise TypeError("The score function should be a callable, %s (%s) "
"was passed."
% (self.score_func, type(self.score_func)))
self._check_params(X, y)
# Changed here also
score_func_ret = self.score_func(X, y, discrete_features)
if isinstance(score_func_ret, (list, tuple)):
self.scores_, self.pvalues_ = score_func_ret
self.pvalues_ = np.asarray(self.pvalues_)
else:
self.scores_ = score_func_ret
self.pvalues_ = None
self.scores_ = np.asarray(self.scores_)
return self
这可以简单地调用:
clf = SelectKBestCustom(mutual_info_classif,k=2)
clf.fit(X, y, discrete_features=[0, 1, 2])
编辑:
上述解决方案在管道中也很有用,在调用fit() 时可以为discrete_features 参数分配不同的值。
另一种解决方案(不太可取):
尽管如此,如果您只是需要暂时使用SelectKBest 和mutual_info_classif(只是分析结果),我们还可以创建一个自定义函数,它可以在内部使用硬编码的discrete_features 调用mutual_info_classif。大致如下:
def mutual_info_classif_custom(X, y):
# To change discrete_features,
# you need to redefine the function each time
# Because once the func def is supplied to selectKBest, it cant be changed
discrete_features = [0, 1, 2]
return mutual_info_classif(X, y, discrete_features)
上述函数的用法:
selector = SelectKBest(mutual_info_classif_custom).fit(X, y)