【问题标题】:Calculating Knn in python在python中计算Knn
【发布时间】:2020-07-02 00:02:46
【问题描述】:

我想写一个函数来返回一个包含多数类的数字。

我编写了以下函数来计算距离。

距离 metric给定(Euclid、Manthan等)。

xTrainInstances - 是一个包含所有火车实例的数据框

xSeriesTestVector - 是一个 Series 对象,来自测试集

 def calc_distances(xSeriesTestVector, xTrainInstances, distanceMetric):
 distances = np.zeros(xTrainInstances.shape[0])
 for i in range(xTrainInstances.shape[0]):
    distances[i] = distanceMetric(xSeriesTestVector, xTrainInstances.iloc[i])
return distances

假设我有以下数据框,幸存的列是我的类别。

                    Survived
 PassengerId          
    1                   0
    2                   1
    3                   1
    4                   1
    5                   0

我的问题

我想知道如何实现以下功能? 我卡住了,distances 给我返回了一个距离数组,从 predict_one_instance,我想返回正确的类别

  • xSeriesTestVector - 是一个 Series 对象,来自待分类的测试集
  • xTrainInstances - 一个数据框,包括所有要比较的火车实例
  • yTrainCategories - 包含所有火车类别的数据框
  • distanceMetric - 距离函数的名称,不是字符串
  • k - 最近邻的数量(我们从 k 票中选择多数)

                        Pclass  SibSp  Parch     Fare   Age
         pid
         1                 3      1      0   7.2500  22.0
         2                 1      1      0  71.2833  38.0
         3                 3      0      0   7.9250  26.0
         4                 1      1      0  53.1000  35.0
         5                 3      0      0   8.0500  35.0
                         Pclass  SibSp  Parch     Fare   Age
         pid
         1                 3      1      0   7.2500  22.0
         2                 1      1      0  71.2833  38.0
         3                 3      0      0   7.9250  26.0
         4                 1      1      0  53.1000  35.0
         5                 3      0      0   8.0500  35.0
    
    
        def predict_one_instance(xSeriesTestVector, 
            xTrainInstances,yTrainCategories,distanceMetric,k):
    
            distances = calc_distances(xSeriesTestVector, xTrainInstances,distanceMetric)
    

【问题讨论】:

  • 到目前为止你尝试过什么?我知道您遇到了困难,但是您的 predict_one_instance 实施的哪一部分不适合您?
  • 顺便说一句,变量和函数名称应该遵循lower_case_with_underscores 样式。

标签: python machine-learning knn


【解决方案1】:

请看一下这个例子,使用'manhattan'

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)


dataset.head()


X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
classifier.fit(X_train, y_train)


y_pred = classifier.predict(X_test)


from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

三个大体相似,但略有不同的结果

# manhattan
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         9
Iris-versicolor       1.00      1.00      1.00        15
 Iris-virginica       1.00      1.00      1.00         6

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30


# euclidian
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        11
Iris-versicolor       0.90      1.00      0.95         9
 Iris-virginica       1.00      0.90      0.95        10

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.96        30
   weighted avg       0.97      0.97      0.97        30


# minkowski
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        13
Iris-versicolor       1.00      0.85      0.92        13
 Iris-virginica       0.67      1.00      0.80         4

       accuracy                           0.93        30
      macro avg       0.89      0.95      0.91        30
   weighted avg       0.96      0.93      0.94        30

只需在运行这 3 个示例时更改指标(您可以轻松地循环遍历列表中的这三个项目以自动化整个过程):

metric='manhattan'
metric='euclidian'
metric='minkowski'

资源:

https://www.bogotobogo.com/python/scikit-learn/scikit_machine_learning_k-NN_k-nearest-neighbors-algorithm.php

【讨论】:

    猜你喜欢
    • 2017-09-29
    • 2018-02-14
    • 1970-01-01
    • 2022-08-12
    • 2012-06-07
    • 2015-07-04
    • 2018-11-30
    • 2018-05-14
    • 1970-01-01
    相关资源
    最近更新 更多