【问题标题】:Python number recognition (on colored screen)Python 数字识别(在彩色屏幕上)
【发布时间】:2021-05-07 17:04:56
【问题描述】:

我使用 python 搜索图像识别。似乎没有关于从彩色背景中提取数字的教程,所以我关注了THIS TUTORIAL

import cv2
import matplotlib.pyplot as plt 

def detect_edge(image):
''' function Detecting Edges '''

    image_with_edges = cv2.Canny(image , 100, 200)

    images = [image , image_with_edges]

    location = [121, 122]

    for loc, img in zip(location, images):
        plt.subplot(loc)
        plt.imshow(img, cmap='gray')

    plt.savefig('edge.png')
    plt.show()

image = cv2.imread('myscreenshot.png', 0)
detect_edge(image)

这是我的图片:

这是结果:

有什么办法可以打印出这些数字?

【问题讨论】:

标签: python opencv matplotlib image-recognition pyautogui


【解决方案1】:

这里有一些代码可以为这张图片获取干净的精巧边缘。

import cv2
import numpy as np

# load image
img = cv2.imread("numbers.png");

# change to hue colorspace
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV);
h,s,v = cv2.split(hsv);

# use clahe to improve contrast 
# (the contrast is pretty good already, so not much change, but good habit to have here)
clahe = cv2.createCLAHE(clipLimit = 10) 
contrast = clahe.apply(v);

# use canny
canny = cv2.Canny(contrast, 20, 110);

# show
cv2.imshow('i', img);
cv2.imshow('v', v);
cv2.imshow('c', contrast);
cv2.imshow("canny", canny);
cv2.waitKey(0);

# save
cv2.imwrite("edges.png", canny);

不使用任何像 pytesseract 之类的 OCR,我看不出有一种明显的方法可以始终将此图像转换为“文本”数字。我会把它留给其他可能知道如何在没有任何模式识别的情况下解决这个问题的人,因为没有它我什至不知道从哪里开始。如果你愿意放弃这个限制,那么 pytessaract 应该没有问题;甚至可能不进行这样的处理。

好的,我填写了图片的数字。由于某种原因,OpenCV 的 findContours 的层次结构不合作,所以我不得不手动完成,这使得这段代码非常笨拙。老实说,如果我要从头开始再试一次,我会尝试找到对每个像素和阈值有贡献的颜色,然后组合蒙版。

import cv2
import numpy as np

# check if small box is in big box
def contained(big, small):
    # big corners
    x,y,w,h = big;
    big_tl = [x, y];
    big_br = [x+w, y+h];

    # small corners
    x,y,w,h = small;
    small_tl = [x, y];
    small_br = [x+w, y+h];

    # check
    if small_tl[0] > big_tl[0] and small_br[0] < big_br[0]:
        if small_tl[1] > big_tl[1] and small_br[1] < big_br[1]:
            return True;
    return False;

# load image
img = cv2.imread("numbers.png");

# change to hue colorspace
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV);
h,s,v = cv2.split(hsv);

# use clahe to improve contrast 
# (the contrast is pretty good already, so not much change, but good habit to have here)
clahe = cv2.createCLAHE(clipLimit = 10) 
contrast = clahe.apply(v);

# rescale
scale = 2.0;
h, w = img.shape[:2];
h = int(h * scale);
w = int(w * scale);
contrast = cv2.resize(contrast, (w,h), cv2.INTER_LINEAR);
img = cv2.resize(img, (w,h), cv2.INTER_LINEAR);

# use canny
canny = cv2.Canny(contrast, 10, 60);

# show
cv2.imshow('i', img);
cv2.imshow('v', v);
cv2.imshow('c', contrast);
cv2.imshow("canny", canny);
cv2.waitKey(0);

# try to fill in contours
# contours
_, contours, hierarchy = cv2.findContours(canny, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE);

# filter contours by size
# filter out noisy bits and the big grid boxes
filtered = [];
for contour in contours:
    perimeter = cv2.arcLength(contour, True);
    if 50 < perimeter and perimeter < 750:
        filtered.append(contour);

# draw contours again
# create a mask of the contoured image
mask = np.zeros_like(contrast);
mask = cv2.drawContours(mask, filtered, -1, 255, -1);

# close to get rid of annoying little gaps
kernel = np.ones((3,3),np.uint8)
mask = cv2.dilate(mask,kernel,iterations = 1);
mask = cv2.erode(mask,kernel, iterations = 1);

# contours
_, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE);

# alright, hierarchy is being stupid, plan B
# SUUUUUPEEERRR JAAAANK
outer_cntrs = [a for a in range(len(contours))];
children = [];
for a in range(len(contours)):
    if a in outer_cntrs:
        # get current box
        big_box = cv2.boundingRect(contours[a]);
        # check against all other boxes
        for b in range(0, len(contours)):
            if b in outer_cntrs:
                small_box = cv2.boundingRect(contours[b]);
                # remove any children
                if contained(big_box, small_box):
                    outer_cntrs.remove(b);
                    children.append(contours[b]);

# # select by hierarchy
top_cntrs = [];
for a in range(len(contours)):
    if a in outer_cntrs:
        top_cntrs.append(contours[a]);

# create a mask of the contoured image
mask = np.zeros_like(contrast);
mask = cv2.drawContours(mask, top_cntrs, -1, 255, -1);
mask = cv2.drawContours(mask, children, -1, 255, -1);

# close
kernel = np.ones((3,3),np.uint8)
mask = cv2.dilate(mask,kernel,iterations = 1);
mask = cv2.erode(mask,kernel, iterations = 1);

# do contours agains because opencv is being super difficult
# honestly, at this point, a fill method would've been better
# contours
_, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE);

# fill in
for con in contours:
    cv2.fillPoly(mask, pts = [con], color=(255));
for con in children:
    cv2.fillPoly(mask, pts = [con], color=(0));

# resize back down
h, w = mask.shape;
h = int(h / scale);
w = int(w / scale);
mask = cv2.resize(mask, (w,h));

# show
cv2.imshow("mask", mask);
cv2.waitKey(0);

# save
cv2.imwrite("filled.png", mask);

【讨论】:

  • 似乎 python 无法使用 pytessaract 将它们全部提取出来。有没有办法例如:用颜色填充数字,或者更清楚地说明它们以便识别。顺便说一句,干得好
  • 有。我刚下班回来,需要吃饭,所以我还需要一段时间才能尝试解决这个问题。简短的回答:OpenCV 轮廓。您可以从轮廓中制作一个蒙版(确保检查嵌套在里面的轮廓并取消填充这些位)。不过,我会先在 clahe 增强图像上尝试 tessaract,如果它有效则简单得多。
  • 您的最后一个代码不显示数字 5 images.guru/i/duGGc
  • 这些数字也在变化,颜色和背景颜色也在变化,必须是一种更简单的方法来识别屏幕上出现的数字,如本教程:youtube.com/watch?v=y1ZrOs9s2QA
【解决方案2】:

你可以分三步找到数字



  1. 自适应阈值结果:

    • 在这里我们看到90 与其他数字不同。我们需要去掉9的边界。

  2. 侵蚀结果:

  3. Pytesseract 结果:

    • 8 | 1
      
      5 9
      4 @
      3 | 3
      6 | 1
      
    • pytesseract 有多种页面分割模式可供选择

    • 如果你想从输出中删除|,你可以使用re.sub

    • text = re.sub('[^A-Za-z0-9]+', ',', text)
      
    • 结果将是:

      • 8
        1
        5
        9
        4
        3
        3
        6
        1
        

代码:

import cv2
import pytesseract
import re
import numpy as np

image = cv2.imread("7UUGYHw.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 13, 2)
erode = cv2.erode(thresh, np.array((7, 7)), iterations=1)
text = pytesseract.image_to_string(erode, config="--psm 6")
text = re.sub('[^A-Za-z0-9]+', '\n', text)
print(text)

【讨论】:

  • 干得好!但仍有一些数字没有显示,如何解决?
  • 一些数字?只有0 无法显示。您需要应用 test-skew-correction 以使其对 pytesseract 可读。但是,其他图像可能无法读取。
  • 除了零和一,它工作正常,似乎 pytesseract 不想认出它们
  • 一个?请重新阅读我的答案,所有的都被认可。唯一无法识别的数字是0,因为pytesseract 不是旋转不变的。你需要 trainpytesseract 与相似的 0 样本在 imaeg 中被识别。
猜你喜欢
  • 2020-02-21
  • 1970-01-01
  • 2011-07-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-07-03
  • 1970-01-01
  • 2017-07-17
相关资源
最近更新 更多