【问题标题】:kNN algorithm's parameters using cross-validation使用交叉验证的 kNN 算法的参数
【发布时间】:2019-08-22 00:53:33
【问题描述】:

我使用机器学习算法 kNN,而不是将数据集划分为 66.6% 用于训练和 33.4% 用于测试,我需要使用以下参数的交叉验证:K=3, 1/欧几里得

K=3没有什么玄机,我只是在代码中添加:

Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean') 

它已经解决了。我无法理解的是 1/euclidean,以及如何将其应用于代码?

import pandas as pd
import time
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics

def openfile():
   df = pd.read_csv('Testfile - kNN.csv')

   return df


def main():

   start_time = time.time()
   dataset = openfile()

   X = dataset.drop(columns=['Label'])
   y = dataset['Label'].values

   X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

   Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
   Classifier.fit(X_train, y_train)

   y_pred_class = Classifier.predict(X_test)

   score = cross_val_score(Classifier, X, y, cv=10)

   y_pred_prob = Classifier.predict_proba(X_test)[:, 1]

   print("accuracy_score:", metrics.accuracy_score(y_test, y_pred_class),'\n')

   print("confusion matrix")
   print(metrics.confusion_matrix(y_test, y_pred_class),'\n')

   print("Background precision score:", metrics.precision_score(y_test, y_pred_class, labels=['background'], average='micro')*100,"%")
   print("Botnet precision score:", metrics.precision_score(y_test, y_pred_class, labels=['bot'], average='micro')*100,"%")
   print("Normal precision score:", metrics.precision_score(y_test, y_pred_class, labels=['normal'], average='micro')*100,"%",'\n')

   print(metrics.classification_report(y_test, y_pred_class, digits=2),'\n')
   print(score,'\n')
   print(score.mean(),'\n')


   print("--- %s seconds ---" % (time.time() - start_time))

【问题讨论】:

  • “我需要”是什么意思,这个1/euclidean 究竟来自哪里?在任何情况下听起来都毫无意义,因为它会分配 k 个 最远 个邻居而不是 k 个 最近 个邻居,这实际上违反了该技术所基于的理念......

标签: python machine-learning scikit-learn cross-validation knn


【解决方案1】:

您可以创建自己的函数并将其作为可调用对象传递给metric 参数。

创建您的函数,如下所示:

from scipy.spatial import distance
def inverse_euc(a,b):
    return 1/distance.euclidean(a, b)

现在在您的KNN 函数中将其用作callable

Classifier = KNeighborsClassifier(algorithm='ball_tree',n_neighbors=3, p=2, metric=inverse_euc)

【讨论】:

  • 嗯......从编码的角度来看,它确实是一个有效的答案 (+1)
猜你喜欢
  • 2014-06-04
  • 2016-11-27
  • 2018-08-06
  • 1970-01-01
  • 2017-03-30
  • 2017-12-22
  • 2012-08-21
  • 2017-04-10
  • 2021-02-08
相关资源
最近更新 更多