如何在 Python 中使用 OpenCV 检测以上行的文本答案

【问题标题】：How to detect the text above lines using OpenCV in Python如何在 Python 中使用 OpenCV 检测以上行的文本
【发布时间】：2020-07-26 16:50:30
【问题描述】：

我有兴趣检测线条（我设法使用霍夫变换弄清楚）及其上方的文本。

我的测试图片如下：

我写的代码如下。（我已经编辑，以便我可以遍历每一行的坐标）

import cv2
import numpy as np

img=cv2.imread('test3.jpg')
#img=cv2.resize(img,(500,500))
imgGray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
imgEdges=cv2.Canny(imgGray,100,250)
imgLines= cv2.HoughLinesP(imgEdges,1,np.pi/180,230, minLineLength = 700, maxLineGap = 100)
imgLinesList= list(imgLines)

a,b,c=imgLines.shape
line_coords_list = []
for i in range(a):
    line_coords_list.append([(int(imgLines[i][0][0]), int(imgLines[i][0][1])), (int(imgLines[i][0][2]), int(imgLines[i][0][3]))])

print(line_coords_list)#[[(85, 523), (964, 523)], [(85, 115), (964, 115)], [(85, 360), (964, 360)], [(85, 441), (964, 441)], [(85, 278), (964, 278)], [(85, 197), (964, 197)]]

roi= img[int(line_coords_list[0][0][1]): int(line_coords_list[0][1][1]), int(line_coords_list[0][0][0]) : int(line_coords_list[0][1][0])]
print(roi) # why does this print an empty list?
cv2.imshow('Roi NEW',roi)

现在我只是不知道如何检测这些线上方的感兴趣区域。是否可以说裁剪每一行并让图像说 roi_1 ， roi_2 ， roi_n ，其中每个 roi 是第一行上方的文本，第二行上方的文本等？

我希望输出是这样的。

【问题讨论】：

将形态学应用于阈值图像并获取轮廓。使用轮廓提取每一行文本。如果页面上的虚线仍有长线，则按宽度或高度过滤轮廓。例如，请参阅stackoverflow.com/questions/61198983/…
@fmw42 - 谢谢你，但是它检测到所有文本。我如何只检测上面的虚线？
舍弃第一行文本。
是的，我只需要行上方的文字吗？另外，我该如何按宽度或高度过滤轮廓？我知道如何找到轮廓和过滤长度

标签： python python-3.x opencv image-processing opencv3.0

【解决方案1】：

您已检测到线路。现在您必须使用y 坐标将图像分割成线条之间的区域，然后在白色背景（纸张）上搜索黑色像素（文字）。

沿x 和y 轴构建直方图可能会为您提供您正在寻找的感兴趣区域。

只是为了回答您在 cmets 中的问题，例如，如果您有一个图像 img 和感兴趣的区域，y 坐标 (100,200) 跨越图像的整个宽度，您可以裁剪该区域并在那里搜索任何类似的东西：

cropped = img[100:200,5:-5]  # crop a few pixels off in x-direction just in case

现在搜索：

top, left = 10000, 10000
bottom, right = 0, 0
for i in range(cropped.shape[0]) :
    for j in range(cropped.shape[1]) :
        if cropped[i][j] < 200 :    # black?
            top = min( i, top)
            bottom = max( i, bottom)
            left = min( j, left)
            right = max( j, right)

或者类似的东西......

【讨论】：

@ lenik - 我能再多一点指导吗，我是新来使用 opencv 的，我从来没有使用过直方图。
print(imgLines)= [[[ 38 255 437 255]] [[ 38 253 437 253]] [[ 38 330 437 330]] [[ 38 328 437 328]] [[ 38 404 437 404]] [[ 38 402 437 402]] [[ 38 181 437 181]] [[ 38 477 437 477]] [[ 38 479 437 479]] [[ 38 179 437 179]] [[ 38 104 437 104]]] 这些描述了行，我可以提取特定区域，但不应该只有6个吗？
@AlanJones 实际上只有 6 个，但有些是上下重复的，比如 253 和 255、402 和 404 等。
@AlanJones 在imgLines 中显然有 6 行带有 y 坐标：255, 330, 404, 477, 104, 179 - 一旦你对它们进行排序，你就有了平均宽度分配给写作的空间和 6 个潜在区域进行裁剪和分析。
@AlanJones 在答案中添加了几行代码

【解决方案2】：

这是在 Python/OpenCV 中执行此操作的一种方法。

读取输入
转换为灰色
阈值 (OTSU)，使文本在黑色背景上显示为白色
应用形态扩张和水平内核来模糊一行中的文本
应用具有垂直内核的开放形态以去除虚线中的细线
获取轮廓
找到具有最低 Y 边界框值的轮廓（最顶部的框）
在输入上绘制除最上面的所有边界框
保存结果

输入：

import cv2
import numpy as np

# load image
img = cv2.imread("text_above_lines.jpg")

# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# threshold the grayscale image
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# use morphology erode to blur horizontally
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (151, 3))
morph = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, kernel)

# use morphology open to remove thin lines from dotted lines
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 17))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

# find contours
cntrs = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cntrs = cntrs[0] if len(cntrs) == 2 else cntrs[1]

# find the topmost box
ythresh = 1000000
for c in cntrs:
    box = cv2.boundingRect(c)
    x,y,w,h = box
    if y < ythresh:
        topbox = box
        ythresh = y

# Draw contours excluding the topmost box
result = img.copy()
for c in cntrs:
    box = cv2.boundingRect(c)
    if box != topbox:
        x,y,w,h = box
        cv2.rectangle(result, (x, y), (x+w, y+h), (0, 0, 255), 2)

# write result to disk
cv2.imwrite("text_above_lines_threshold.png", thresh)
cv2.imwrite("text_above_lines_morph.png", morph)
cv2.imwrite("text_above_lines_lines.jpg", result)

#cv2.imshow("GRAY", gray)
cv2.imshow("THRESH", thresh)
cv2.imshow("MORPH", morph)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
cv2.destroyAllWindows()

阈值图像：

形态图像：

结果：

【讨论】：

非常感谢 fmw42。快速提问：我无法理解代码或参数的位置，以便控制行上方的文本和没有行的文本。它怎么知道“我们住在哪个星球？没有虚线？
我只是丢弃了最靠近顶部的行。我假设每一页的顶部都有一个问题。所以第一行永远是问题。
啊，这解释了为什么当我在彼此之上添加两个问题时它会失败。不过，我从您的帮助中学到了很多东西。现在我只需要弄清楚如何在不丢弃最靠近顶部的行的情况下为行上方的文本创建边界框。