【发布时间】:2014-06-12 09:49:48
【问题描述】:
我有以下代码获取一组图像,每个训练集中大约 50 个图像,然后创建一个线性模型并尝试对数据进行分类。我也有一个测试集,但它甚至不能以任何准确度对训练数据进行分类。我加载图像的方式有什么错误吗?如果有帮助,我很乐意提供更多代码或我的输出。
def create_image_list(file_path):
image_list = []
for filename in glob.glob(file_path):
img = Image.open(filename)
img_resized = img.resize((32, 32), Image.ANTIALIAS)
pix = img.load()
pixlist = []
for x in range(0, 32):
for y in range(0,32):
pixlist.append(pix[x,y][0])
pixlist.append(pix[x,y][1])
pixlist.append(pix[x,y][2])
image_list.append(pixlist)
return image_list
dalmation_training = create_image_list('/images/dalmatian/training/*')
dollabill_training = create_image_list('/images/dollar_bill/training/*')
pizza_training = create_image_list('/images/pizza/training/*')
soccer_ball_training = create_image_list('/images/soccer_ball/training/*')
sunflower_training = create_image_list('/images/sunflower/training/*')
c = '1e2'
testing_set = dalmation_training + dollabill_training + pizza_training + soccer_ball_training + sunflower_training
dalmation_y = [1]*len(dalmation_training ) + [-1]*len(dollabill_training) + [-1]*len(pizza_training) + [-1]*len(soccer_ball_training) + [-1]*len(sunflower_training)
dalmation_model_linear = svm_train(dalmation_y, testing_set, '-t 0 -c %s -b 1 -q' % c)
dollabill_y = [-1]*len(dalmation_training ) + [1]*len(dollabill_training) + [-1]*len(pizza_training) + [-1]*len(soccer_ball_training) + [-1]*len(sunflower_training)
dollabill_model_linear = svm_train(dollabill_y, testing_set, "-t 0 -c %s -b 1 -q" % c)
pizza_y = [-1]*len(dalmation_training ) + [-1]*len(dollabill_training) + [1]*len(pizza_training) + [-1]*len(soccer_ball_training) + [-1]*len(sunflower_training)
pizza_model_linear = svm_train(pizza_y, testing_set, "-t 0 -c %s -b 1 -q" % c)
soccer_ball_y = [-1]*len(dalmation_training ) + [-1]*len(dollabill_training) + [-1]*len(pizza_training) + [1]*len(soccer_ball_training) + [-1]*len(sunflower_training)
soccer_ball_model_linear = svm_train(soccer_ball_y, testing_set, "-t 0 -c %s -b 1 -q" % c)
sunflower_y = [-1]*len(dalmation_training) + [-1]*len(dollabill_training) + [-1]*len(pizza_training) + [-1]*len(soccer_ball_training) + [1]*len(sunflower_training)
sunflower_model_linear = svm_train(sunflower_y, testing_set, "-t 0 -c %s -b 1 -q" % c)
print 'dalmation linear'
result1, something, p1 = svm_predict([1]*len(testing_set), testing_set, dalmation_model_linear, "-b 1")
print 'dollabill linear'
result2, something, p2 = svm_predict([1]*len(testing_set), testing_set, dollabill_model_linear, "-b 1")
print 'pizza linear'
result3, something, p3 = svm_predict([1]*len(testing_set), testing_set, pizza_model_linear, "-b 1")
print 'soccer linear'
result4, something, p4 = svm_predict([1]*len(testing_set), testing_set, soccer_ball_model_linear, "-b 1")
print 'sunflower linear'
result5, something, p5 = svm_predict([1]*len(testing_set), testing_set, sunflower_model_linear, "-b 1")
当我运行此程序并运行一些准确度测量时,每次使用最后一个数据集的准确度都在 20% 左右,向日葵的准确度接近 100%,其他的接近 5%。我相信我将它放在 libsvm 的正确格式中,但我找不到任何线索。我已经尝试过从 1e-8 到 1e8 的 c 值可能不同,并且每个值的准确度都略有不同,不超过 5%。
任何意见将不胜感激,我很乐意提供更多信息!
【问题讨论】:
-
我得到的一个奇怪的输出是警告:只要我的 c 大于 1e-1,就会达到最大迭代次数
标签: python machine-learning svm libsvm