不均匀间隔数字的字符分割和识别答案

【问题标题】：Character Segmentation and Recognition for Unevenly Spaced Digits不均匀间隔数字的字符分割和识别
【发布时间】：2018-04-26 16:03:25
【问题描述】：

我有一个如下所示的数字图像。

我使用自适应阈值和检测轮廓的方法将上面的数字分段为它的数字，并将边界矩形的高度和重量限制为大于 15 以获得以下分段数字。

我想分割上图中的数字，而不是上面的输出，以便单独获取每个数字。在调整到 (28, 28) 后，这个结果可以进一步输入到 MNIST 的 CNN，以更好地预测特定数字。
So, is there any other neat way of segmenting this number in image into individual digits?

here 提到的一种方法建议滑动固定大小的绿色窗口并通过训练神经网络来检测数字。那么，如何训练这个神经网络对数字进行分类呢？这种方法避免了 OpenCV 方法来分隔每个单独的数字，但只是在整个图像上滑动窗口不会有点贵。训练时如何处理正例和负例（我是否应该创建一个单独的数据集...正例可以是 mnist 数字，但负例呢。）？

细分：

img = cv2.imread('Image')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blur = cv2.GaussianBlur(gray,(3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
            cv2.THRESH_BINARY_INV, 7,10)
thresh = clear_border(thresh)

# find contours in the thresholded image, then initialize the
# list of group locations
clone = np.dstack([gray.copy()] * 3)
groupCnts = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
    cv2.CHAIN_APPROX_SIMPLE)
groupCnts = groupCnts[0] if imutils.is_cv2() else groupCnts[1]
groupLocs = []

clone = np.dstack([gray.copy()] * 3)
# loop over the group contours
for (i, c) in enumerate(groupCnts):
    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)
    # only accept the contour region as a grouping of characters if
    # the ROI is sufficiently large
    if w >= 15 and h >= 15:
        print (i, (x, y, w, h))
        cv2.rectangle(clone, (x,y), (x+w, y+h), (255,0,0), 1)
        groupLocs.append((x, y, w, h))

滑动窗口：

clf = joblib.load("digits_cls.pkl")    #mnist trained classifier
img = cv2.imread('Image', 0)
winW, winH = (22, 40)
cv2.imshow("Window0", img)
cv2.waitKey(1)

blur = cv2.GaussianBlur(img, (5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv2.THRESH_BINARY,11,2)
thresh = clear_border(thresh) 

for (x, y, window) in sliding_window(img, stepSize=10, windowSize=(winW, winH)):
    if (window.shape[0] != winH or window.shape[1] != winW):
        continue
    clone = img.copy()
    roi = thresh[y:y+winH, x:x+winW]
    roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)
    roi = cv2.dilate(roi, (3, 3))
    cv2.imshow("Window1", roi)
    cv2.waitKey(1)
    roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
    nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
    print (nbr)

    # since we do not have a classifier, we'll just draw the window
    clone = img.copy()
    cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
    cv2.imshow("Window2", clone)
    cv2.waitKey(1)
    time.sleep(0.95)

奇怪的输出（即使是它预测的空白窗口）：522637753787357777722

分隔连接的数字：

 h,w = img.shape[:2]
 count = 0
 iw = 15
 dw = w
 sw, sh = int(0), int(0)
 while (dw > 0):
    new_img = img[:, sw:(count+1)*iw]
    dw = dw - iw
    sw = sw + iw
    if (dw-iw < 0):
        iw = w
    new = os.path.join('amount/', 'amount_'+ str(count)+'.png')
    cv2.imwrite(new, new_img)

输出：
-->
-->

已经找到一种方法来分离这些连接的数字并将它们提供给 mnist 训练的分类器，但输出仍然不准确。

我使用的步骤：
(i)提取第一张图片
(ii)将第一张图像分割成单独的图像，即得到第二张图像。
(iii)如果是，请查看图像宽度是否超过某个阈值，将其进一步分割以产生单独的数字（如上图所示）
(iv) 将步骤 3 后得到的所有独立数字输入 mnist 分类器，得到基于重构图像的数字预测。
Lengthy right?
Is there any other efficient way to convert first image to digits directly (yes I used pytesseract too!!)?

【问题讨论】：

可以加分字代码吗？
训练 CNN 识别和生成 00 和 000（以及其他连接的数字组合）不是更简单吗？
@barny，不是创建多位数数据集，而是在 SVHN 数据集上进行训练并在上述多位数上进行测试会产生好的结果吗？
我不知道，但作为一个问题，现有的文本图像识别器似乎已经解决了一些问题，而尝试以某种通用/可重复/稳健的方式来 a) 识别然后b) 拆分这些多位数图像似乎要困难得多
@barny SupposeXYZ 提供提示很好，但如果您有专业知识，提供解决方案不是更好吗？只是说'。

标签： python image opencv image-segmentation

【解决方案1】：

如果您有时间和资源，训练一个新的神经网络将是一个很好的解决方案。

要单独分隔每个数字，您可以尝试反转图像的强度，使笔迹为白色，背景为黑色。然后水平投影值（水平总和所有像素值）并寻找峰值。每个峰值位置都应指示一个新的字符位置。

投影图上的额外平滑函数应该会细化字符位置。

【讨论】：

如果数字有两个峰值，具体取决于您截取它们的位置，例如 0、4、6、8 和 9？
你可以尝试使用高斯滤波器平滑信号，平均字符宽度的大小和峰值应该出现在每个字符的中心
您甚至可以通过扩张（形态扩张过程）所有字符来添加预处理步骤，以便将 blob 放在一起以帮助您以后的投影结果