如何在tess-two android中获取识别文本的每个单词和边界框答案

【问题标题】：How to get each word and bounding box of recognized text in tess-two android如何在tess-two android中获取识别文本的每个单词和边界框
【发布时间】：2014-03-23 20:16:04
【问题描述】：

我正在使用 ResultIterator 从图像中获取每个单词，但我在调用 iterator.begin() 时出错。我不知道为什么。

这是我当前的代码，

//Global
ArrayList<String> words = new ArrayList<String>();

@Override
    public void onPreviewFrame(final byte[] data, Camera camera) {
        final SurfaceView surfaceView = (SurfaceView) getActivity().findViewById(R.id.cameraView);
        //get camera params for ocr
        Camera.Parameters cameraParams = _camera.getParameters();
        int width = surfaceView.getWidth();
        int height = surfaceView.getHeight();
        PixelFormat pixFormat = new PixelFormat();
        PixelFormat.getPixelFormatInfo(cameraParams.getPreviewFormat(), pixFormat);
        int bpp = pixFormat.bytesPerPixel;
        int bpl = bpp * width;

        //ocr
        ocr.setImage(data, width, height, bpp, bpl);
        ocr.setRectangle(0, 50, width, height - 50);

        // Iterate through the results.
        final ResultIterator iterator = ocr.getResultIterator();
        iterator.begin(); //crashes my app
        do {
            words.add(iterator.getUTF8Text(PageIteratorLevel.RIL_WORD));
        } while (iterator.next(PageIteratorLevel.RIL_WORD));
    }

【问题讨论】：

请发布您的堆栈跟踪。

标签： android tesseract

【解决方案1】：

根据 Tesseract 的APIExample，您需要先调用Recognize 方法才能获取迭代器。您可能需要为tess-two 实现此方法。

另一个地点是通过hOCR 输出。见Export HOCR output for tesseract OCR in android。

【讨论】：

幸运的是，我找到了一个更简单的解决方案。在调用ocr.getResultIterator() 之前，我必须先输入ocr.getUTF8Text()，它起作用了。
这也适用于GetUTF8Text 在幕后调用Recognize。