如何从图像中识别最大的边界矩形并使用 Opencv 和 python 将它们分成单独的图像答案

【问题标题】：How to identify largest bounding rectangles from an image and separate them into separate images using Opencv and python如何从图像中识别最大的边界矩形并使用 Opencv 和 python 将它们分成单独的图像
【发布时间】：2020-02-12 05:17:56
【问题描述】：

我是 Opencv 和 python 的新手，并试图识别 sample image 中标记的最大三个矩形并将它们提取到三个单独的图像中。我能够识别图像中的轮廓，但它们都显示出来了（如second image 所示），我无法分离出三个最大的轮廓。到目前为止我写的代码：

import cv2

image = cv2.imread('imgpath')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
canny = cv2.Canny(gray, 130, 255, 1)

cnts = cv2.findContours(canny, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

#largest_contours = sorted(cnts, key=cv2.contourArea)[-3:]
#print(len(largest_contours))
for c in cnts:
    cv2.drawContours(image,[c], 0, (0,255,0), 3)

#cv2.imshow("result", image)
#cv2.drawContours(image, largest_contours, -1, (0,255,0), 3)
cv2.imshow('contours', image)
cv2.waitKey(0)

【问题讨论】：

您可以添加您的原始输入图像吗？

标签： python image opencv image-processing contour

【解决方案1】：

这是一种方法：

将图像转换为灰度
获取二值图像的自适应阈值
查找轮廓并排序最大的三个
执行轮廓近似以确保我们有一个方形轮廓
执行透视变换以获得自上而下的视图
旋转图像以获得正确的方向

透视变换和旋转后提取的矩形

import cv2
import numpy as np

def rotate_image(image, angle):
    # Grab the dimensions of the image and then determine the center
    (h, w) = image.shape[:2]
    (cX, cY) = (w / 2, h / 2)

    # grab the rotation matrix (applying the negative of the
    # angle to rotate clockwise), then grab the sine and cosine
    # (i.e., the rotation components of the matrix)
    M = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])

    # Compute the new bounding dimensions of the image
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))

    # Adjust the rotation matrix to take into account translation
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY

    # Perform the actual rotation and return the image
    return cv2.warpAffine(image, M, (nW, nH))

def perspective_transform(image, corners):
    def order_corner_points(corners):
        # Separate corners into individual points
        # Index 0 - top-right
        #       1 - top-left
        #       2 - bottom-left
        #       3 - bottom-right
        corners = [(corner[0][0], corner[0][1]) for corner in corners]
        top_r, top_l, bottom_l, bottom_r = corners[0], corners[1], corners[2], corners[3]
        return (top_l, top_r, bottom_r, bottom_l)

    # Order points in clockwise order
    ordered_corners = order_corner_points(corners)
    top_l, top_r, bottom_r, bottom_l = ordered_corners

    # Determine width of new image which is the max distance between 
    # (bottom right and bottom left) or (top right and top left) x-coordinates
    width_A = np.sqrt(((bottom_r[0] - bottom_l[0]) ** 2) + ((bottom_r[1] - bottom_l[1]) ** 2))
    width_B = np.sqrt(((top_r[0] - top_l[0]) ** 2) + ((top_r[1] - top_l[1]) ** 2))
    width = max(int(width_A), int(width_B))

    # Determine height of new image which is the max distance between 
    # (top right and bottom right) or (top left and bottom left) y-coordinates
    height_A = np.sqrt(((top_r[0] - bottom_r[0]) ** 2) + ((top_r[1] - bottom_r[1]) ** 2))
    height_B = np.sqrt(((top_l[0] - bottom_l[0]) ** 2) + ((top_l[1] - bottom_l[1]) ** 2))
    height = max(int(height_A), int(height_B))

    # Construct new points to obtain top-down view of image in 
    # top_r, top_l, bottom_l, bottom_r order
    dimensions = np.array([[0, 0], [width - 1, 0], [width - 1, height - 1], 
                    [0, height - 1]], dtype = "float32")

    # Convert to Numpy format
    ordered_corners = np.array(ordered_corners, dtype="float32")

    # Find perspective transform matrix
    matrix = cv2.getPerspectiveTransform(ordered_corners, dimensions)

    # Return the transformed image
    return cv2.warpPerspective(image, matrix, (width, height))

image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,3)

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:3]

ROI_number = 0
for c in cnts:
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.015 * peri, True)

    if len(approx) == 4:
        cv2.drawContours(image,[c], 0, (36,255,12), 3)
        transformed = perspective_transform(original, approx)
        rotated = rotate_image(transformed, -90)
        cv2.imwrite('ROI_{}.png'.format(ROI_number), rotated)
        cv2.imshow('ROI_{}'.format(ROI_number), rotated)
        ROI_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.waitKey()

【讨论】：

这是完美的 nathancy。这正是我所需要的。谢谢！
@kcgr8chief 很高兴为您提供帮助！如果对您有用，请考虑 accepting the answer！
虽然不是主题，但你知道我还可以从这张图片中提取每个标签的文本的方法吗？如果我使用 tesseract 或 google vision api，他们只会提取纯文本，而不会提供关于它对应于哪个标签的有意义的信息。对此问题的任何指导将不胜感激。
如果你有 ROI 的边界框坐标，你可以使用 Numpy 切片来提取标签。看看this

【解决方案2】：

根据面积对等高线进行排序，然后选择前三个。

    cnts = sorted(cnts, key=lambda c: cv2.contourArea(c), reverse=True)
    for c in cnts[:3]:
        cv2.drawContours(image,[c], 0, (0,255,0), 3)
        (x,y,w,h) = cv2.boundingRect(c)

(x,y,w,h) 表示轮廓的坐标 (x,y)、宽度和高度。这些值可用于裁剪矩形。

【讨论】：

谢谢。我试过你的代码，但它只让我得到顶部矩形和第二个和第三个矩形的某些部分。它甚至没有得到第二个和第三个的边界。 #largest_contours = sorted(cnts, key=cv2.contourArea)[-3:] 我的注释代码也给了我相同的结果。不知道出了什么问题。
在排序和挑选前三名后查看它的显示方式：imgur.com/a/y8FmNC8
而不是绘制轮廓，使用(x, y, w, h) = cv2.boundingRect(c)cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 3)
尝试使用 boundingRect 而不是绘制轮廓。仍然没有得到结果。请参考：imgur.com/a/ouGTJnB