【问题标题】：How to sort an array of rectangles by position?如何按位置对矩形数组进行排序？
【发布时间】：2015-10-17 06:03:58
【问题描述】：

我刚刚意识到，如果我只对包含文本的区域执行 OCR 处理，它会快很多。所以我所做的是检测图像中的文本区域，然后对每个区域执行 OCR 处理。这是使用 OpenCV “检测文本区域”步骤的结果（我用它在图像上绘制矩形）：

唯一的问题是我无法按照它们出现在原始图像上的顺序排列文本结果。在这种情况下，它应该是：

circle oval triangle square trapezium
diamond rhombus parallelogram rectangle pentagon
hexagon heptagon octagon nonagon decagon

其他一些情况：

基本上任何其他带有文字的图像。

所以我尝试对矩形数组（原点、宽度和高度）进行排序，然后重新排列与它们关联的文本。

更多信息

我不知道是否有必要，但这是我使用的代码：

我如何检测到文本区域

+(NSMutableArray*) detectLetters:(UIImage*) image


{
    cv::Mat img;
    UIImageToMat(image, img);
    if (img.channels()!=1) {
        NSLog(@"NOT A GRAYSCALE IMAGE! CONVERTING TO GRAYSCALE.");
        cv::cvtColor(img, img, CV_BGR2GRAY);
    }
//The array of text regions (rectangle)
NSMutableArray* array = [[NSMutableArray alloc] init];

cv::Mat img_gray=img, img_sobel, img_threshold, element;

//Edge detection
cv::Sobel(img_gray, img_sobel, CV_8U, 1, 0, 3, 1, 0, cv::BORDER_DEFAULT);

cv::threshold(img_sobel, img_threshold, 0, 255, CV_THRESH_OTSU+CV_THRESH_BINARY);

element = getStructuringElement(cv::MORPH_RECT, cv::Size(17, 3) );

cv::morphologyEx(img_threshold, img_threshold, CV_MOP_CLOSE, element);

std::vector< std::vector< cv::Point> > contours;

//
cv::findContours(img_threshold, contours, 0, 1);

std::vector<std::vector<cv::Point> > contours_poly( contours.size() );


for( int i = 0; i < contours.size(); i++ )
    if (contours[i].size()>50)
    {
        cv::approxPolyDP( cv::Mat(contours[i]), contours_poly[i], 3, true );
        cv::Rect appRect( boundingRect( cv::Mat(contours_poly[i]) ));
        if (appRect.width>appRect.height){
                [array addObject:[NSValue valueWithCGRect:CGRectMake(appRect.x,appRect.y,appRect.width,appRect.height)]];
        }

    }

return array;
}

这是 OCR 过程（使用 Tesseract）：

NSMutableArray *arr=[STOpenCV detectLetters:img];

CFTimeInterval totalStartTime = CACurrentMediaTime();
NSMutableString *res=[[NSMutableString alloc] init];

for(int i=0;i<arr.count;i++){
    NSLog(@"\n-------------\nPROCESSING REGION %d/%lu",i+1,(unsigned long)arr.count);

    //Set the OCR region using the result from last step
    tesseract.rect=[[arr objectAtIndex:i] CGRectValue];


    CFTimeInterval startTime = CACurrentMediaTime();

    NSLog(@"Start to recognize: %f",startTime);

    [tesseract recognize];

    NSString *result=[tesseract recognizedText];

    NSLog(@"Result: %@", result);
    [res appendString:result];

    CFTimeInterval elapsedTime = CACurrentMediaTime() - startTime;

    NSLog(@"FINISHED: %f", elapsedTime);
}

【问题讨论】：

这是您的参考图片吗？或者你有更复杂的图像？无论如何，请发布您的原始图片，以便我们试用它们，并希望给您一个准确的答案
谢谢@Miki。我添加了更多图像。基本上它可以是任何有文字的图像。

标签： ios opencv ocr tesseract

【解决方案1】：

您想要的是按 y 位置 (y - 高度/2) 对矩形数组进行排序，如果它们在同一垂直线上，则按 x (x - 宽度/2) 排序。

NSArray *sortedRects;
sortedRects = [unsortedRects sortedArrayUsingComparator:^NSComparisonResult(id a, id b) {
    CGRect *first = (CGRect*)a;
    CGRect *second = (CGRect*)b;
CGFloat yDifference = first.y - (first.height / 2.0 < second.y) - (second.height / 2.0)
    return (yDifference < 0) || (yDifference == 0 && (first.x - (first.width / 2.0 < second.x) || (second.width / 2.0)));
}];

【讨论】：

是的，但仅在完美世界中，因为即使矩形在我们眼中“在同一条线上”，它们也不总是具有相同的位置值
在这种情况下，您可以在 yDifference 比较中添加一些 epsilon 值。除了检查
这听起来很对。实际上我之前考虑过，但我一直在为所有文本块找到一个完美的 epsilon 值。我考虑过使用块高度的平均值作为 epsilon 值，但如果小块中有一些大块似乎会不准确。真的很近！
你也可以反转这个过程，先使用 X 再使用 Y，在这种情况下，你可能需要更大的 epsilon（最大宽度的一半？）
否则您可以使用插入排序的修改版本： Init：在未排序的表（UT）中找到 Y 最低的矩形，将其放入已排序的表（ST）中循环：找到在 UT 中具有最低 Y 的矩形，将其命名为 A。遍历 ST 中的元素 B。如果 A.y