Tesseract，openCV，python：如何获取句子或同一行文本的边界框？答案

【问题标题】：Tesseract, openCV, python: how to get bounding box for a sentence or same line of text?Tesseract，openCV，python：如何获取句子或同一行文本的边界框？
【发布时间】：2021-12-05 09:27:10
【问题描述】：

我想对图像进行一些文本识别。我可以识别文本和相应的边界框，但只能逐字识别，我想在同一行文本上做同样的事情。在下面的代码中，我注意到当我显示边界框坐标时，当单词在同一行时，b['top'] 的值是相似的。我不知道我是否可以使用它，但我希望每行文本和相关句子都有一个边界框。

在我制作的代码下方：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import cv2 
import pytesseract
from pytesseract import Output

pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

img = cv2.imread('./images/page_2.jpg') # load img

img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)  #transform colored img to grayscale

plt.imshow(img)

boxes = pytesseract.image_to_data(img, output_type=Output.DICT) #transform image to dict

boxes = pd.DataFrame(boxes) #dict to dataframe
boxes['text'].replace('', np.nan, inplace=True) #replace empty values by NaN
boxes= boxes.dropna(subset = ['text']) #delete rows with NaN 

print(boxes)

for index, b in boxes.iterrows():
    (x,y,w,h) = b['left'],b['top'],b['width'],b['height']
    print((x,y,w,h), b['text'])
    cv2.rectangle(img,(x,y),(w+x,h+y), (0,0,255),1)
    
cv2.imshow('result',img)
cv2.waitKey(0)

“盒子”字典的输出：

     level  page_num  block_num  par_num  line_num  word_num  left  top  \
4        5         1          1        1         1         1    32   24   
5        5         1          1        1         1         2   100   24   
6        5         1          1        1         1         3   191   28   
7        5         1          1        1         1         4   227   28   
8        5         1          1        1         1         5   257   24   
..     ...       ...        ...      ...       ...       ...   ...  ...   
154      5         1          1       11         1         7   261  457   
155      5         1          1       11         1         8   320  461   
156      5         1          1       11         1         9   351  457   
157      5         1          1       11         1        10   376  457   
158      5         1          1       11         1        11   468  457   

     width  height       conf       text  
4       60      17  93.283920     Maitre  
5       82      19  93.204414   corbeau,  
6       29      13  96.932060        sur  
7       22      12  96.932060         un  
8       50      17  93.306122      arbre  
..     ...     ...        ...        ...  
154     51      21  79.999794      qu'on  
155     23      13  90.411606         ne  
156     18      21  21.623993        I'y  
157     85      21  90.583260  prendrait  
158     44      21  96.933327      plus.

(x,y,w,h) 和 b['text'] 的输出（带有文本的边界框）：

(32, 24, 60, 17) Maitre
(100, 24, 82, 19) corbeau,
(191, 28, 29, 13) sur
(227, 28, 22, 12) un
(257, 24, 50, 17) arbre
(315, 24, 70, 21) perché,
(79, 49, 58, 17) Tenait
(144, 53, 23, 13) en
(174, 53, 34, 13) son
(216, 50, 33, 16) bec
(257, 53, 22, 13) un
(287, 49, 84, 22) fromage.
(32, 75, 60, 17) Maitre
(100, 75, 61, 17) renard
(169, 79, 31, 17) par
(206, 75, 64, 17) I'odeur
(277, 75, 68, 17) alléché
(353, 88, 3, 6) ,
(81, 101, 27, 16) Lui
(115, 101, 28, 16) tint
(151, 100, 11, 17) 4
(169, 104, 34, 17) peu
(211, 100, 42, 21) prés
(260, 104, 21, 13) ce
(289, 101, 76, 20) langage
(374, 105, 3, 12) :
(81, 126, 31, 16) «Et
(119, 126, 72, 21) bonjour
(199, 126, 88, 17) Monsieur
(294, 126, 22, 16) du
(324, 125, 87, 18) Corbeau.
(31, 151, 40, 17) Que
(78, 155, 46, 13) vous
(131, 151, 40, 17) 6tes
(177, 151, 32, 21) joli!
(217, 155, 35, 17) que
(260, 155, 44, 13) vous
(312, 155, 29, 13) me
(348, 151, 80, 17) semblez
(436, 151, 52, 17) beau!
(81, 176, 47, 18) Sans
(136, 177, 63, 19) mentir,
(207, 177, 15, 17) si
(229, 178, 48, 16) votre
(284, 181, 72, 17) ramage
(81, 202, 25, 17) Se
(114, 204, 79, 19) rapporte
(200, 202, 11, 17) a
(218, 204, 48, 15) votre
(273, 203, 87, 20) plumage,
(31, 228, 48, 17) Vous
(86, 227, 40, 18) étes
(134, 228, 15, 16) le
(157, 227, 63, 21) phénix
(227, 228, 34, 17) des
(269, 227, 51, 18) hétes
(327, 228, 23, 16) de
(358, 232, 33, 13) ces
(398, 228, 49, 17) bois»
(31, 253, 53, 17) Aces
(92, 255, 45, 15) mots
(145, 253, 15, 17) le
(167, 253, 78, 17) corbeau
(253, 257, 22, 13) ne
(283, 257, 22, 13) se
(312, 255, 40, 15) sent
(360, 257, 33, 17) pas
(400, 253, 23, 17) de
(429, 253, 40, 21) joie;
(81, 279, 19, 16) Et
(107, 283, 43, 16) pour
(157, 280, 74, 16) montrer
(238, 283, 22, 13) sa
(267, 279, 45, 16) belle
(319, 279, 43, 19) voix,
(33, 304, 8, 16) ll
(49, 308, 53, 13) ouvre
(110, 308, 22, 13) un
(140, 304, 47, 21) large
(195, 304, 33, 17) bec
(236, 304, 54, 17) laisse
(297, 305, 67, 16) tomber
(371, 308, 22, 13) sa
(400, 304, 53, 21) proie.
(32, 330, 23, 17) Le
(63, 330, 60, 16) renard
(131, 330, 38, 17) s'en
(177, 330, 48, 17) saisit
(232, 331, 17, 15) et
(256, 330, 28, 16) dit:
(291, 330, 49, 16) "Mon
(348, 330, 35, 16) bon
(391, 330, 92, 19) Monsieur,
(103, 355, 92, 21) Apprenez
(202, 359, 36, 17) que
(245, 356, 35, 16) tout
(287, 355, 67, 17) flatteur
(31, 381, 25, 16) Vit
(63, 385, 34, 12) aux
(104, 381, 71, 20) dépens
(181, 381, 24, 16) de
(212, 381, 43, 16) celui
(262, 381, 28, 20) qui
(298, 380, 79, 17) l'écoute:
(32, 406, 50, 17) Cette
(90, 406, 50, 21) lecon
(148, 407, 40, 16) vaut
(195, 406, 40, 17) bien
(243, 410, 22, 13) un
(273, 406, 79, 21) fromage
(359, 410, 45, 13) sans
(411, 406, 67, 17) doute."
(81, 432, 22, 16) Le
(110, 432, 77, 16) corbeau
(195, 432, 76, 16) honteux
(279, 433, 17, 15) et
(303, 432, 63, 16) confus
(31, 457, 42, 17) Jura
(81, 457, 44, 17) mais
(133, 461, 22, 13) un
(163, 461, 34, 17) peu
(205, 457, 36, 17) tard
(250, 470, 3, 6) ,
(261, 457, 51, 21) qu'on
(320, 461, 23, 13) ne
(351, 457, 18, 21) I'y
(376, 457, 85, 21) prendrait
(468, 457, 44, 21) plus.

图像结果：

result

【问题讨论】：

最好将文本输出作为文本而不是图像发布。
我不明白你的评论抱歉，代码在哪里？
我的意思不是代码，而是问题帖。文本图像（在本例中为 dict 和 带有文本的框）会阻碍人们复制数据以寻找解决问题的方法。比贴文还要好，贴出boxes.to_dict()的输出。
完成，谢谢！

标签： python pandas opencv ocr python-tesseract

【解决方案1】：

我注意到当我显示我的边界框坐标时，当单词在同一行时，b['top'] 的值是相似的。我不知道我是否可以使用它，但我希望每行文本和相关句子都有一个边界框。

您完全可以使用它。这会通过聚合垂直重叠的框来生成线条：

def lineup(boxes):
    linebox = None
    for _, box in boxes.iterrows():
        if linebox is None: linebox = box           # first line begins
        elif box.top <= linebox.top+linebox.height: # box in same line
            linebox.top = min(linebox.top, box.top)
            linebox.width = box.left+box.width-linebox.left
            linebox.heigth = max(linebox.top+linebox.height, box.top+box.height)-linebox.top
            linebox.text += ' '+box.text
        else:                                       # box in new line
            yield linebox
            linebox = box                           # new line begins
    yield linebox                                   # return last line

lineboxes = pd.DataFrame.from_records(lineup(boxes))

【讨论】：