检测文本图像是否倒置答案

【问题标题】：Detect if a text image is upside down检测文本图像是否倒置
【发布时间】：2019-09-03 08:48:52
【问题描述】：

我有数百张图像（扫描文档），其中大部分是歪斜的。我想用 Python 去歪斜它们。
这是我使用的代码：

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)

此代码适用于大多数文档，除了某些角度：（180 和 0）和（90 和 270）通常被检测为相同的角度（即，它在 (180 和 0) 和（90 和 270））。所以我得到了很多颠倒的文件。

这里是一个例子：

我得到的结果图像与输入图像相同。

是否有任何建议可以使用 Opencv 和 Python 检测图像是否颠倒？
PS：我尝试使用 EXIF 数据检查方向，但没有导致任何问题解决方案。

编辑：
可以使用 Tesseract（Python 的 pytesseract）检测方向，但只有在图像包含大量字符时才有可能。
对于任何可能需要这个的人：

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))

如果文档包含足够多的字符，则 Tesseract 可以检测方向。但是，当图像的线条较少时，Tesseract 建议的方位角通常是错误的。所以这不可能是 100% 的解决方案。

【问题讨论】：

不是一个解决方案，但您可以使用的另一种启发式方法（假设您正在阅读拉丁脚本）是比较左右或上半部分和下半部分的黑色量。如果页面右侧（换行符）和/或底部的黑色明显更多，我猜它可能是颠倒的。
论文中总是有标题吗？你能说是否有模式可以遵循吗？我会像最后一个选项一样离开 OCR ......它会更容易检测白点，创建一个矩形并测量它的大小。就像标题和其余部分之间的白点一样。
@singrium 嗯不确定，如果它们的大小是恒定的，你可以使用一些卷积过滤器，看看它们是直立还是倒置效果更好（你得到更多“匹配”）......否则我'我不确定（我不太了解 CV tbh），我的意思是你肯定可以创建一个神经网络或对其进行分类的东西，但这还需要更多的工作。
嗯，对于那些带有蓝线的文档，您可以读取图像的蓝色通道并创建蓝色的阈值。如果它检测到蓝色的存在，并且在文档的中间下方，则可以说文档是倒置的。
您可以将页面预处理为具有高对比度的完全灰度，然后按照 jdehesa 的建议应用黑白测试。不过，您总是需要在 OCR 或任何检测之前进行规范化。

标签： python opencv image-rotation skew

【解决方案1】：

Python3/OpenCV4 script 对齐扫描的文档。

旋转文档并对行求和。当文档旋转0度和180度时，图片中会有很多黑色像素：

使用记分方法。为每张图片与斑马图案的相似度打分。得分最高的图像具有正确的旋转。您链接到的图像偏离了 0.5 度。为了便于阅读，我省略了一些函数，完整代码可以是found here。

# Rotate the image around in a circle
angle = 0
while angle <= 360:
    # Rotate the source image
    img = rotate(src, angle)    
    # Crop the center 1/3rd of the image (roi is filled with text)
    h,w = img.shape
    buffer = min(h, w) - int(min(h,w)/1.15)
    roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
    # Create background to draw transform on
    bg = np.zeros((buffer*2, buffer*2), np.uint8)
    # Compute the sums of the rows
    row_sums = sum_rows(roi)
    # High score --> Zebra stripes
    score = np.count_nonzero(row_sums)
    scores.append(score)
    # Image has best rotation
    if score <= min(scores):
        # Save the rotatied image
        print('found optimal rotation')
        best_rotation = img.copy()
    k = display_data(roi, row_sums, buffer)
    if k == 27: break
    # Increment angle and try again
    angle += .75
cv2.destroyAllWindows()

如何判断文件是否倒置？填充从文档顶部到图像中第一个非黑色像素的区域。用黄色测量面积。面积最小的图像将是正面朝上的图像：

# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()

【讨论】：

这是一个很好的答案。但是倒置检测可能会在每章的最后一页等失败。我想您还可以对左右边距进行类似的分析，因为段落结尾平均比段落开头更深。
我建议从顶部和左侧总结非黑人，因为英文文本将从左上角开始。
对于倒置检测，您能否利用由于大写字母和 t、h、k 等字母的频率而存在光环效应的事实。在您的静止图像中，光环上方位于白色条带下方。即，在白色带之间减半的区域总和需要位于顶部。

【解决方案2】：

假设您已经对图像进行了角度校正，您可以尝试以下方法来确定它是否被翻转：

将校正后的图像投影到 y 轴，这样您就可以得到每条线的“峰值”。重要提示：实际上几乎总是有两个子峰！
通过与高斯卷积来平滑此投影，以消除精细结构、噪声等。
对于每个峰，检查较强的子峰是在顶部还是底部。
计算底部有子峰的峰比例。这是您的标量值，可让您确信图像方向正确。

步骤 3 中的峰值查找是通过查找高于平均值的部分来完成的。然后通过 argmax 找到子峰。

这里有一张图来说明这种方法；几行你的例子图片

蓝色：原始投影
橙色：平滑投影
水平线：整个图像的平滑投影的平均值。

这里有一些代码可以做到这一点：

import cv2
import numpy as np

# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]

# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)

# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]

# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
    line = smooth[start:end]
    if np.argmax(line) < len(line)/2:
        lower_peaks += 1

print(lower_peaks / len(line_starts))

这为给定的图像打印 0.125，因此方向不正确，必须翻转。

请注意，如果图像中存在图像或任何未按行组织的内容（可能是数学或图片），则此方法可能会严重失效。另一个问题是行数太少，导致统计数据不佳。

另外，不同的字体可能会导致不同的分布。您可以在几张图像上尝试此方法，看看该方法是否有效。我没有足够的数据。

【讨论】：

这个答案需要说明为什么采取这种方法以及为什么它有点奏效。两个主要峰值是由于 o、b、q、e 和其他字母等字母的“o-ness”。通过平滑，您在这里失去了可靠性。由于大写字母和 t、h、l、d 等字母的频率，忽略两个主要峰并集中在它们上方和下方的两个子峰上。在您的高斯图像中，子峰使图像明显颠倒
您所说的在理想世界中是正确的。然而，检测小峰需要更灵敏的检测，并且更容易在扫描中出现不规则现象（例如，示例扫描边缘的垂直黑线）。因此我平滑了投影。
主峰包含大量噪声，子峰包含信号。我非常不同意平滑，即平均噪声和信号，即使在现实世界中也更好。

【解决方案3】：

您可以使用Alyn 模块。要安装它：

pip install alyn

然后用它来校正图像（取自主页）：

from alyn import Deskew
d = Deskew(
    input_file='path_to_file',
    display_image='preview the image on screen',
    output_file='path_for_deskewed image',
    r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()

请注意，Alyn 仅用于校正文本。

【讨论】：

您是否尝试过您发布的代码？当我运行它时，我收到此错误ImportError: cannot import name 'Deskew'
如果您将纠偏更改为小写，则它可以工作，但是会出现另一个错误。似乎它不适用于 python 3.7 (?)
@L.C. -- 不，它不适用于 python 3；但只有微小的变化。