【发布时间】:2020-07-22 20:30:48
【问题描述】:
当文本以不同颜色出现时,Pytesseract 无法提取文本。我尝试使用 opencv 来反转图像,但它不适用于深色文本颜色。
图片:
import cv2
import pytesseract
from PIL import Image
def text(image):
image = cv2.resize(image, (0, 0), fx=7, fy=7)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite("gray.png", gray)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
cv2.imwrite("gray_blur.png", blur)
thresh = cv2.threshold(blur, 127, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2.imwrite("thresh.png", thresh)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
cv2.imwrite("opening.png", opening)
invert = 255 - opening
cv2.imwrite("invert.png", invert)
data = pytesseract.image_to_string(invert, lang="eng", config="--psm 7")
return data
有没有办法从给定的图像中提取两个文本:DEADLINE(red) 和 WHITE HOUSE(white)
【问题讨论】:
标签: python opencv python-imaging-library ocr python-tesseract