【发布时间】:2018-06-23 19:43:43
【问题描述】:
我一直在研究线性 SVM 的理论,并且在 Python Scikitlearn 中有一个易于使用的理论...举个假设的例子,假设一份咖啡本身就很恶心 - 就像一杯奶油,自然。添加奶油+含糖咖啡似乎很受欢迎,尽管所有开车经过的咖啡馆都证明了这一点。所以这自然会产生一个简单的图表,在 (0,1) 和 (1,0) 之间有一条线,将好的值 (1,1) 分开......但这个简单示例的结果是不准确的:
from __future__ import division
# data points [coffee, cream]:
data = [[ 0,0 ], [ 0,1 ], [ 1,0 ], [ 1,1 ] ]
#Just last one is a positive experience
category = [ -1, -1, -1, 1 ]
import numpy
from sklearn.svm import SVC
clf = SVC(kernel='linear')
clf.fit(data, category)
#Get m coefficients:
coef = clf.coef_[0]
b = clf.intercept_[0]
print('This is the M*X+b=0 equation...')
print('M=%s' % (coef))
print('b=%s' % (b))
print('So the equation of the separating line in this 2d svm is:')
print('%f*x + %f*y + %f = 0' % (coef[0],coef[1],b))
print('The support vector limit lines are:')
print('%f*x + %f*y + %f = -1' % (coef[0],coef[1],b))
print('%f*x + %f*y + %f = 1' % (coef[0],coef[1],b))
vertmatrix = [[x] for x in coef]
good = 0
bad = 0
for i, d in enumerate(data):
#i-th element, d in data:
calculatedValue = numpy.dot(d, vertmatrix)[0] + b
print( 'Mx+b for x=%s calculates to %s' % (d, calculatedValue) )
if calculatedValue > 0 and category[i] > 0:
good += 1
elif calculatedValue < 0 and category[i] < 0:
good += 1
else:
bad +=1 #they should have matched category.
print('accuracy=%f' % (good/(good+bad)) )
#The same as the builtin "score" accuracy:
print('accuracy=%f' % clf.score(data, category) )
【问题讨论】:
-
当前数据不平衡。它有一个类的 75%。所以你需要调整超参数来适应这个问题。也许只是像这样使用
class_weight:clf = SVC(kernel='linear',class_weight={-1:1, 1:2})
标签: python numpy scikit-learn svm