如何改进验证码解码的 pytesseract 函数？答案

【问题标题】：How to improve pytesseract function for capctha decoding?如何改进验证码解码的 pytesseract 函数？
【发布时间】：2021-05-14 12:25:13
【问题描述】：

我想从 python 中的图像中提取数字。为了做到这一点，我选择了 pytesseract。当我尝试从图像中提取文本时，结果并不令人满意。我还浏览了以下代码并实现了与其他答案一起列出的所有技术。然而，它似乎表现不佳。

示例图片：

我的代码是：

import cv2 as cv
import pytesseract
from PIL import Image
import matplotlib.pyplot as plt


pytesseract.pytesseract.tesseract_cmd = r"E:\tesseract\tesseract.exe"

def recognize_text(image):
    #  edge preserving filter denoising 10,150
    dst = cv.pyrMeanShiftFiltering(image, sp=10, sr=150)
    plt.imshow(dst)
    #  grayscale image 
    gray = cv.cvtColor(dst, cv.COLOR_BGR2GRAY)
    #  binarization 
    ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
    #  morphological manipulation corrosion    expansion 
    erode = cv.erode(binary, None, iterations=2)
    dilate = cv.dilate(erode, None, iterations=1)

    #  logical operation makes the background white    the font is black for easy recognition. 
    cv.bitwise_not(dilate, dilate)
    #  identify 
    test_message = Image.fromarray(dilate)
    custom_config = r'digits'
    text = pytesseract.image_to_string(test_message, config=custom_config)
    print(f' recognition result ：{text}')



src = cv.imread(r'roughh/testt/f.jpg')
recognize_text(src)

我的代码的问题是它只适用于“396156”和“436359”的图像，而不适用于任何其他图像。请对我的代码提出一些改进建议。

【问题讨论】：

标签： python opencv ocr tesseract python-tesseract

【解决方案1】：

不知道你的问题是否解决了，但是这种图片必须使用solution进行预处理。您将需要调整参数。我使用了类似的数据集，上述解决方案效果很好。让我知道你的结果。

编辑答案

我正在改进我的答案，而不是只显示链接以供参考。

这类问题的关键是图像预处理。主要思想是清理输入图像，只保留字符。

给定一个输入图像

我们希望输出图像为

以下代码包含我基于solution使用的图像预处理：

# loading image and checking the height and width
img = cv.imread('PNgCd.jpg')
(h, w) = img.shape[:2]
print("Height: {} Width:{}".format(h,w))
cv.imshow('Image', img)
cv.waitKey(0)
cv.destroyAllWindows()

#converting into RBG and resizing the image
img = cv.cvtColor(img, cv.COLOR_BGR2RGB) # converting into RGB order
img = imutils.resize(img, width=450) #resizing the width into 500 pxls
cv.imshow('Image', img)
cv.waitKey(0)
cv.destroyAllWindows()

#gray scale
gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
cv.imshow('Gray', gray)
cv.waitKey(0)
cv.destroyAllWindows()

# image thresholdinf with Otsu method and inverse operation
thresh = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)[1]
cv.imshow('Thresh Otsu', thresh)
cv.waitKey(0)
cv.destroyAllWindows()

#distance tramsform
dist = cv.distanceTransform(thresh, cv.DIST_L2, 5)
dist = cv.normalize(dist, dist, 0, 1.0, cv.NORM_MINMAX)
dist = (dist*255).astype('uint8')
cv.imshow('dist', dist)
cv.waitKey(0)
cv.destroyAllWindows()

#image thresholding with binary operation
dist = cv.threshold(dist, 0, 255, cv.THRESH_BINARY | 
cv.THRESH_OTSU)[1]
cv.imshow('thresh binary', dist)
cv.waitKey(0)
cv.destroyAllWindows()

#morphological operation
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
opening = cv.morphologyEx(dist, cv.MORPH_OPEN, kernel)
cv.imshow('Morphological - Opening', opening)
cv.waitKey(0)
cv.destroyAllWindows()

#dilation or erode (it's depend on your image)
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
dilation = cv.dilate(opening, kernel, iterations = 1)
cv.imshow('Dilation', dilation)
cv.waitKey(0)
cv.destroyAllWindows()

# found contours and filtering them
cnts = cv.findContours(dilation.copy(), cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
nums = []
for c in cnts:
    (x, y, w, h) = cv.boundingRect(c)

    if w >= 5 and h > 15:
        nums.append(c)
len(nums)

#Convex hull and image masking
nums = np.vstack([nums[i] for i in range(0, len(nums))])
hull = cv.convexHull(nums)
mask = np.zeros(dilation.shape[:2], dtype='uint8')
cv.drawContours(mask, [hull], -1, 255, -1)
mask = cv.dilate(mask, None, iterations = 2)
cv.imshow('mask', mask)
cv.waitKey(0)
cv.destroyAllWindows()

# bitwise to retrieval the characters from the original image
final = cv.bitwise_and(dilation, dilation, mask=mask)
cv.imshow('final', final)
cv.imwrite('final.jpg', final)
cv.waitKey(0)
cv.destroyAllWindows()

# OCR'ing the pre-processed image
config = "--psm 7 -c tessedit_char_whitelist=0123456789"
text = tsr.image_to_string(final, config=config)
print(text)

代码是如何处理这种图像的示例。我们必须记住，Tesseract 并不完美，它需要经过清理的图像才能正常工作。这段代码对于其他类似的图像也可能会失败，我们必须调整参数或尝试其他图像预处理技术。您还必须知道--psm 模式，在这种情况下我考虑过--psm 7，它将图像视为单个文本行。对于这种图像，你也可以试试--psm 8，它把图像当作一个单词。此代码只是一个起点，您可以根据需要对其进行改进。

【讨论】：

虽然此链接可能会回答问题，但最好在此处包含答案的基本部分并提供链接以供参考。如果链接页面发生更改，仅链接答案可能会失效。 - From Review
没问题，我明白。我会改进我的答案。感谢您的建议。
完成了。谢谢你。谢谢。