100-Days-Of-ML-Code
中文版《机器学习100天》
GitHub :https://github.com/MLEveryday/100-Days-Of-ML-Code

数据集 | 社交网络
部分数据集如下图所示:
《机器学习100天》学习笔记——Day 11_k-NN(k近邻法)

该数据集包含了社交网络中用户的信息。这些信息涉及用户ID,性别,年龄以及预估薪资。一家汽车公司刚刚推出了他们新型的豪华SUV,我们尝试预测哪些用户会购买这种全新SUV。并且在最后一列用来表示用户是否购买。我们将建立一种模型来预测用户是否购买这种SUV该模型基于两个变量,分别是年龄和预计薪资。因此我们的特征矩阵将是这两列。我们尝试寻找用户年龄与预估薪资之间的某种相关性,以及Ta是否购买SUV的决定。

(1)导入库

import pandas as pd

(2)导入数据集

dataset = pd.read_csv('D:/PycharmProjects/DataSet/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

(3)将数据拆分为训练集和测试集

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

(4)特征缩放

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

(5)使用k-NN对训练集进行训练

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

对于k-NN算法的解读可参考:https://blog.csdn.net/qq_41929011/article/details/88914931

(6)对测试集进行预测

y_pred = classifier.predict(X_test)

(7)生成混淆矩阵和显示主要分类指标的文本报告

from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(classification_report(y_test, y_pred))

混淆矩阵为:
《机器学习100天》学习笔记——Day 11_k-NN(k近邻法)
显示精确率(precision)、召回率(recall)和F1值的文本报告:
《机器学习100天》学习笔记——Day 11_k-NN(k近邻法)

完整代码及学习图谱如下:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('D:/PycharmProjects/DataSet/Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(classification_report(y_test, y_pred))

《机器学习100天》学习笔记——Day 11_k-NN(k近邻法)

相关文章:

  • 2021-09-06
  • 2022-12-23
  • 2021-07-06
  • 2022-01-01
  • 2021-09-17
  • 2021-12-27
  • 2021-07-06
  • 2022-12-23
猜你喜欢
  • 2021-05-07
  • 2021-08-24
  • 2021-09-30
  • 2022-12-23
  • 2021-06-24
  • 2021-07-16
  • 2021-09-27
相关资源
相似解决方案