1.安装pillow,pytesseract
pip install pillow
pip install pytesseract
2.识别验证码
def get_verifycode(self): \'\'\'识别验证码\'\'\' # 1.定位验证码位置及大小 verifycode_element = self.verifycode_image_element # 定位验证码 location = verifycode_element.location # 获取验证码x,y坐标 size = verifycode_element.size # 验证码高度、宽度、 zuobiao = ( int(location[\'x\']), int(location[\'y\']), int(location[\'x\'] + size[\'width\']), int(location[\'y\'] + size[\'height\'])) # 2.截屏,在截屏中截取验证码位置,再次保存 image_name = self.save_screenshot() # 截屏 img = Image.open(image_name).crop(zuobiao) # 打开截图 img = img.convert(\'RGB\') img.save(image_name) # 3.再次读取识别验证码 code = pytesseract.image_to_string(Image.open(image_name)) # 正则表达式去除空格或其他特殊符号 b = \'\' for i in code.strip(): # pattern = re.compile(r\'[a-zA-Z0-9]\') pattern = re.compile(r\'[0-9]\') # 由于本系统的验证码都是数字,所以正则匹配时,只验证数字 m = pattern.search(i) if m != None: b += i return b
3.pytesseract模块使用出现错误:tesseract is not installed or it\'s not in your path,处理方法:
1)下载tesseract-ocr:tesseract-ocr下载地址:https://github.com/tesseract-ocr/tesseract/wiki
2)安装tesseract-ocr:双击.exe文件安装,并记住安装路径
3)修改python安装路径中的pytesseract.py文件,将tesseract_cmd改为r\'F:\Program Files (x86)\Tesseract-OCR\tesseract.exe\'
文件路径:pyhton安装路径\Lib\site-packages\pytesseract\pytesseract.py