(github:https://github.com/suferyang/KNN/tree/master/KNN)
识别手写的数字0-9,其中图片像素大小为32*32,源码中将像素值用文本格式存储了。
例如:
源代码中目录trainingDigits中包含2000个例子,testDigits中包含大约900个例子,两组数据没有覆盖。
首先我们要讲32*32的二进制图像矩阵转换成1*1024的向量。首先编写一段函数img2vector,打开给定的文件,循环读出文件的前32行,并将每行的头32个字符值存储在数组中。
# 测试0-1图像识别实现代码 def img2vector(filename): returnVec = np.zeros((1,1024)) fr = open(filename) for i in range(32): lineStr = fr.readline() for j in range(32): returnVec[0,32*i+j]=int(lineStr[j]) return returnVec #testVector = img2vector(\'testDigits/0_13.txt\') #print (testVector[0,0:31])
然后编写handwritingClassTest()函数用来测试分类器的代码
def handwritingClassTest(): hwLabels = [] trainingFileList = os.listdir(\'trainingDigits\') m = len(trainingFileList) trainingMat = np.zeros((m,1024)) for i in range(m): filename = trainingFileList[i] fileStr = filename.split(\'.\')[0] classNum = int(fileStr.split(\'_\')[0]) hwLabels.append(classNum) trainingMat[i,:] = img2vector(\'trainingDigits\%s\'%(filename)) testFileList = os.listdir(\'testDigits\') errorCount = 0.0 mTest = len(testFileList) for i in range(mTest): filenames = testFileList[i] fileStr = filenames.split(\'.\')[0] TestNum = int(fileStr.split(\'_\')[0]) testVector = img2vector(\'testDigits\%s\'%(filenames)) classiferResult = classify0(testVector,trainingMat,hwLabels,3) print ("the true num is %d,the classifer num is %d"%(TestNum,classiferResult)) if(classiferResult !=TestNum): errorCount +=1.0 print ("\n the total number of errors is :%d"%(errorCount)) print ("\nthe total error rate is:%f"%(errorCount/float(mTest)))
执行handwritingClassTest()得到运行的结果:
错误率大概为1.05%(其中分类算法,文件读取函数已在上一节中实现了)