Swift iOS - 视觉框架文本识别和矩形答案

【问题标题】：Swift iOS - Vision framework text recognition and rectanglesSwift iOS - 视觉框架文本识别和矩形
【发布时间】：2022-08-18 16:21:31
【问题描述】：

我试图在使用 Vision 框架找到的文本区域上绘制矩形，但它们总是有点偏离。我这样做是这样的：

    public func drawOccurrencesOnImage(_ occurrences: [CGRect], _ image: UIImage) -> UIImage? {

    UIGraphicsBeginImageContextWithOptions(image.size, false, 0.0)

    image.draw(at: CGPoint.zero)
    let currentContext = UIGraphicsGetCurrentContext()

    currentContext?.addRects(occurrences)
    currentContext?.setStrokeColor(UIColor.red.cgColor)
    currentContext?.setLineWidth(2.0)
    currentContext?.strokePath()

    guard let drawnImage = UIGraphicsGetImageFromCurrentImageContext() else { return UIImage() }

    UIGraphicsEndImageContext()
    return drawnImage
}

但是返回的图像总是看起来差不多，但并不真正正确：

这就是我创建盒子的方式，与 Apple 完全相同：

        let boundingRects: [CGRect] = observations.compactMap { observation in

        guard let candidate = observation.topCandidates(1).first else { return .zero }

        let stringRange = candidate.string.startIndex..<candidate.string.endIndex
        let boxObservation = try? candidate.boundingBox(for: stringRange)

        let boundingBox = boxObservation?.boundingBox ?? .zero

        return VNImageRectForNormalizedRect(boundingBox,
                                            Int(UIViewController.chosenImage?.width ?? 0),
                                            Int(UIViewController.chosenImage?.height ?? 0))
    }

（来源：https://developer.apple.com/documentation/vision/recognizing_text_in_images）

谢谢你。

你的 y 坐标被翻转。查看Detecting Objects in Still Images 并查看boundingBox 例程，注意它们翻转了y 坐标。如果没有看到您是如何构建 occurrences 的 [CGRect] 数组，我们无法进一步评论。
@Rob 根据 Apple 文档 (developer.apple.com/documentation/vision/…)。我编辑了问题并将其添加进去。

标签： ios swift uiview vision text-recognition

【解决方案1】：

VNImageRectForNormalizedRect 返回 CGRect，y 坐标翻转。（macOS 和 iOS 使用不同的坐标系）。

相反，我可能会建议改编自Detecting Objects in Still Images 的boundingBox 版本：

fileprivate func boundingBox(forRegionOfInterest: CGRect, withinImageBounds bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height

    // Begin with input rect.
    var rect = forRegionOfInterest

    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.origin.x
    rect.origin.y = (1 - rect.origin.y - rect.height) * imageHeight + bounds.origin.y

    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight

    return rect
}

就我而言，这产生了正确的盒子：

例如。

let request = VNDetectTextRectanglesRequest { [self] request, error in
    guard let results = request.results, error == nil else { return }

    let rects = results
        .compactMap { $0 as? VNTextObservation }
        .map { boundingBox(forRegionOfInterest: $0.boundingBox, withinImageBounds: CGRect(origin: .zero, size: size)) }

    let format = UIGraphicsImageRendererFormat()
    format.scale = 1
    let finalImage = UIGraphicsImageRenderer(bounds: bounds, format: format).image { _ in
        image.draw(in: bounds)
        UIColor.red.setStroke()
        for rect in rects {
            let path = UIBezierPath(rect: rect)
            path.lineWidth = 5
            path.stroke()
        }
    }
    DispatchQueue.main.async { [self] in
        imageView.image = finalImage
    }
}

【讨论】：