【问题标题】:How to improve pytesseract function for capctha decoding?如何改进验证码解码的 pytesseract 函数?
【发布时间】:2021-05-14 12:25:13
【问题描述】:

我想从 python 中的图像中提取数字。为了做到这一点,我选择了 pytesseract。当我尝试从图像中提取文本时,结果并不令人满意。我还浏览了以下代码并实现了与其他答案一起列出的所有技术。然而,它似乎表现不佳。

示例图片:

我的代码是:

import cv2 as cv
import pytesseract
from PIL import Image
import matplotlib.pyplot as plt


pytesseract.pytesseract.tesseract_cmd = r"E:\tesseract\tesseract.exe"

def recognize_text(image):
    #  edge preserving filter denoising 10,150
    dst = cv.pyrMeanShiftFiltering(image, sp=10, sr=150)
    plt.imshow(dst)
    #  grayscale image 
    gray = cv.cvtColor(dst, cv.COLOR_BGR2GRAY)
    #  binarization 
    ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
    #  morphological manipulation corrosion    expansion 
    erode = cv.erode(binary, None, iterations=2)
    dilate = cv.dilate(erode, None, iterations=1)

    #  logical operation makes the background white    the font is black for easy recognition. 
    cv.bitwise_not(dilate, dilate)
    #  identify 
    test_message = Image.fromarray(dilate)
    custom_config = r'digits'
    text = pytesseract.image_to_string(test_message, config=custom_config)
    print(f' recognition result :{text}')



src = cv.imread(r'roughh/testt/f.jpg')
recognize_text(src)

我的代码的问题是它只适用于“396156”和“436359”的图像,而不适用于任何其他图像。请对我的代码提出一些改进建议。

【问题讨论】:

    标签: python opencv ocr tesseract python-tesseract


    【解决方案1】:

    不知道你的问题是否解决了,但是这种图片必须使用solution进行预处理。您将需要调整参数。我使用了类似的数据集,上述解决方案效果很好。让我知道你的结果。

    编辑答案

    我正在改进我的答案,而不是只显示链接以供参考。

    这类问题的关键是图像预处理。主要思想是清理输入图像,只保留字符。

    • 给定一个输入图像

    • 我们希望输出图像为

    以下代码包含我基于solution使用的图像预处理:

    # loading image and checking the height and width
    img = cv.imread('PNgCd.jpg')
    (h, w) = img.shape[:2]
    print("Height: {} Width:{}".format(h,w))
    cv.imshow('Image', img)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #converting into RBG and resizing the image
    img = cv.cvtColor(img, cv.COLOR_BGR2RGB) # converting into RGB order
    img = imutils.resize(img, width=450) #resizing the width into 500 pxls
    cv.imshow('Image', img)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #gray scale
    gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
    cv.imshow('Gray', gray)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    # image thresholdinf with Otsu method and inverse operation
    thresh = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)[1]
    cv.imshow('Thresh Otsu', thresh)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #distance tramsform
    dist = cv.distanceTransform(thresh, cv.DIST_L2, 5)
    dist = cv.normalize(dist, dist, 0, 1.0, cv.NORM_MINMAX)
    dist = (dist*255).astype('uint8')
    cv.imshow('dist', dist)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #image thresholding with binary operation
    dist = cv.threshold(dist, 0, 255, cv.THRESH_BINARY | 
    cv.THRESH_OTSU)[1]
    cv.imshow('thresh binary', dist)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #morphological operation
    kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
    opening = cv.morphologyEx(dist, cv.MORPH_OPEN, kernel)
    cv.imshow('Morphological - Opening', opening)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    #dilation or erode (it's depend on your image)
    kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
    dilation = cv.dilate(opening, kernel, iterations = 1)
    cv.imshow('Dilation', dilation)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    # found contours and filtering them
    cnts = cv.findContours(dilation.copy(), cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
    cnts = imutils.grab_contours(cnts)
    nums = []
    for c in cnts:
        (x, y, w, h) = cv.boundingRect(c)
    
        if w >= 5 and h > 15:
            nums.append(c)
    len(nums)
    
    #Convex hull and image masking
    nums = np.vstack([nums[i] for i in range(0, len(nums))])
    hull = cv.convexHull(nums)
    mask = np.zeros(dilation.shape[:2], dtype='uint8')
    cv.drawContours(mask, [hull], -1, 255, -1)
    mask = cv.dilate(mask, None, iterations = 2)
    cv.imshow('mask', mask)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    # bitwise to retrieval the characters from the original image
    final = cv.bitwise_and(dilation, dilation, mask=mask)
    cv.imshow('final', final)
    cv.imwrite('final.jpg', final)
    cv.waitKey(0)
    cv.destroyAllWindows()
    
    # OCR'ing the pre-processed image
    config = "--psm 7 -c tessedit_char_whitelist=0123456789"
    text = tsr.image_to_string(final, config=config)
    print(text)
    

    代码是如何处理这种图像的示例。我们必须记住,Tesseract 并不完美,它需要经过清理的图像才能正常工作。这段代码对于其他类似的图像也可能会失败,我们必须调整参数或尝试其他图像预处理技术。您还必须知道--psm 模式,在这种情况下我考虑过--psm 7,它将图像视为单个文本行。对于这种图像,你也可以试试--psm 8,它把图像当作一个单词。此代码只是一个起点,您可以根据需要对其进行改进。

    【讨论】:

    • 虽然此链接可能会回答问题,但最好在此处包含答案的基本部分并提供链接以供参考。如果链接页面发生更改,仅链接答案可能会失效。 - From Review
    • 没问题,我明白。我会改进我的答案。感谢您的建议。
    • 完成了。谢谢你。谢谢。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-01-14
    • 1970-01-01
    • 1970-01-01
    • 2010-10-07
    相关资源
    最近更新 更多