题目:

(十九)Sklearn


思路:

    1、数据集的建立使用函数:

datasets.make_classification(n_samples,n_features,n_informative,n_redundant,n_repeated,n_classes)

    2、使用10倍交叉验证分割数据集使用函数:

cross_validation.KFold(length,n_folds,shuffle)

    3、算法的训练:利用分割的数据集配合自己定义好的算法进行训练,自己定义的算法包括了计算accuracy,F1-score,AUC ROC。



实验代码:

from sklearn import cross_validation
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

import numpy as np

performance = np.ndarray(shape=(10, 3, 3))


def Gaussian_naive_Bayes(X_train, y_train):
    clf = GaussianNB()
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    return metric(y_test, pred)


def SVM(X_train, y_train):
    clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    return metric(y_test, pred)


def Random_Forest(X_train, y_train):
    clf = RandomForestClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    pred = clf.predict(X_test)

    return metric(y_test, pred)


def metric(y_test, pred):
    acc = metrics.accuracy_score(y_test, pred)
    f1 = metrics.f1_score(y_test, pred)
    auc = metrics.roc_auc_score(y_test, pred)

    return acc, f1, auc


dataset = datasets.make_classification(n_samples=1000, n_features=10,
                                       n_informative=2, n_redundant=2, n_repeated=0, n_classes=2)

kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
i = 0
for train_index, test_index in kf:
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]
    performance[i, 0, :] = Gaussian_naive_Bayes(dataset[0][train_index], dataset[1][train_index])
    performance[i, 1, :] = SVM(dataset[0][train_index], dataset[1][train_index])
    performance[i, 2, :] = Random_Forest(dataset[0][train_index], dataset[1][train_index])
    i += 1

name = ['GaussianNB', 'SVC', 'RandomForestClassifier']
mean = np.mean(performance, axis=0)
for i in list(range(0, 3)):
    print(name[i])
    print('  Accuracy: ', performance[:, i, 0], ' Averaged: ', mean[i, 0])
    print('  F1-score: ', performance[:, i, 1], ' Averaged: ', mean[i, 1])
    print('  AUC ROC:  ', performance[:, i, 2], ' Averaged: ', mean[i, 2], '\n')

 

实验结果:

(十九)Sklearn

    可以看出,效果方面:随机森林>SVC>GaussianNB

     
                    
            
                

相关文章:

  • 2021-05-08
  • 2021-06-02
  • 2022-12-23
  • 2021-11-17
  • 2022-01-07
  • 2021-07-17
  • 2021-11-23
  • 2022-12-23
猜你喜欢
  • 2021-05-16
  • 2022-12-23
  • 2021-04-27
  • 2022-12-23
  • 2021-04-06
  • 2021-07-25
  • 2022-12-23
相关资源
相似解决方案