【问题标题】:Classification of buildings as per the damage data using SVM使用支持向量机根据损坏数据对建筑物进行分类
【发布时间】:2020-07-12 00:15:06
【问题描述】:

我有一项大学任务要完成。它是关于基于 damage 分类(1-5)的几个建筑物的分类(有 6 个参数)。我按照 SVM 的指导进行了编码,但不确定输出的准确性。你能告诉我如何改进我的结果以及算法的其他选择是什么。 ''' # 支持向量机(SVM)

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Ehsan Duzce.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 7].values

# Taking care of missing data
from sklearn.impute import SimpleImputer
# creating object for SimpleImputer class as "imputer"
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean", verbose=0)
imputer = imputer.fit(X[:, 1:7]) #upper bound is not included, but lower bound
X[:, 1:7] = imputer.transform(X[:, 1:7])

# Avoiding the dummy Variable Trap
X = X[:, 1:] #To remove the first column from the dataset

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test) 

# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'poly', degree = 3)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train

X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 
1, step = 0.01), np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, 
step = 0.01))
Xpred = np.array([X1.ravel(), X2.ravel()] + [np.repeat(0, X1.ravel().size) for _ in 
range(4)]).T
# Xpred now has a grid for x1 and x2 and average value (0) for x3 through x6
pred = classifier.predict(Xpred).reshape(X1.shape)   # is a matrix of 0's and 1's !
plt.contourf(X1, X2, pred, alpha = 1.0, cmap = ListedColormap(('green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
            c = ListedColormap(('red'))(I))
plt.title('SVM (Training set)')
plt.xlabel('Damage Scale')
plt.ylabel('Building Database')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 
1, step = 0.01), np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, 
step = 0.01))
Xpred = np.array([X1.ravel(), X2.ravel()] + [np.repeat(0, X1.ravel().size) for _ in 
range(4)]).T
# Xpred now has a grid for x1 and x2 and average value (0) for x3 through x6
pred = classifier.predict(Xpred).reshape(X1.shape)   # is a matrix of 0's and 1's !
plt.contourf(X1, X2, pred, alpha = 1.0, cmap = ListedColormap(('green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
            c = ListedColormap(('red'))(I))

plt.title('SVM (Test set)')
plt.xlabel('Damage Scale')
plt.ylabel('Building Database')
plt.legend()
plt.show()

'''

【问题讨论】:

    标签: python-3.x matplotlib svm spyder


    【解决方案1】:

    )

    首先,您应该熟悉您的训练数据。据我了解,您只是将数据提供给模型而不对数据进行任何类型的预处理,您不应该这样做。 我看到您正在输入带有平均值的缺失数据,也许尝试删除数据点并查看结果,删除可能“混淆”您的模型的异常值。 另外,您的图不是很友好,您告诉我们数据分类为 1-5,但在图 [-2,2] 中。 但是由于您的问题是特定于算法的,请尝试超参数调整。

    你可以这样做:

    from sklearn.model_selection import GridSearchCV
    
    param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001],'kernel': ['rbf', 'poly', 'sigmoid']}
    
    
    grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)
    grid.fit(X_train,y_train)
    
    
    print(grid.best_estimator_)
    
    

    我建议阅读这篇文章,以了解 SVM 并调整您的参数]

    https://towardsdatascience.com/svm-hyper-parameter-tuning-using-gridsearchcv-49c0bc55ce29

    【讨论】:

    • 您好 @José,感谢您与我们联系,很抱歉看到您的反馈晚了。我拥有的数据集包含真实地震的数据,我无法从中删除丢失的数据。所以我用平均值替换了它们。您共享的链接已被阻止,因此我无法真正理解您给定的代码将如何影响我的算法。这是一个分类问题,所以我也应用了决策树,但结果仍然不令人满意。你能再给点建议吗?谢谢
    猜你喜欢
    • 2018-05-28
    • 2010-12-29
    • 2021-06-28
    • 2011-07-01
    • 2016-05-11
    • 2013-08-31
    • 2014-04-08
    • 2012-08-24
    • 2021-05-04
    相关资源
    最近更新 更多