【发布时间】:2019-08-30 04:27:48
【问题描述】:
我正在尝试使用 OpenCV 中的 EAST 模型来检测图像中的文本。通过网络运行图像后,我成功地获得了输出,但我很难理解我使用的解码功能是如何工作的。我知道我从模型中得到了 5 个数字作为输出,我认为它是从一个点到矩形顶部、底部、左侧和右侧的距离,以及最后的旋转角度。我不确定 decode 函数如何获取文本区域的边界框。
我知道为什么偏移量要乘以 4(在模型中运行时它会缩小 4)。我知道为什么 h 和 w 是这样的。之后我不确定。
scores 是每个区域的置信度分数; geometry 是每个区域的几何值(我提到的 5 个数字) scoreThresh 只是非最大值抑制的一个阈值
def decode(scores, geometry, scoreThresh):
detections = []
confidences = []
############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############
assert len(scores.shape) == 4, "Incorrect dimensions of scores"
assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"
assert scores.shape[0] == 1, "Invalid dimensions of scores"
assert geometry.shape[0] == 1, "Invalid dimensions of geometry"
assert scores.shape[1] == 1, "Invalid dimensions of scores"
assert geometry.shape[1] == 5, "Invalid dimensions of geometry"
assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"
assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"
height = scores.shape[2]
width = scores.shape[3]
for y in range(0, height):
# Extract data from scores
scoresData = scores[0][0][y]
x0_data = geometry[0][0][y]
x1_data = geometry[0][1][y]
x2_data = geometry[0][2][y]
x3_data = geometry[0][3][y]
anglesData = geometry[0][4][y]
for x in range(0, width):
score = scoresData[x]
# If score is lower than threshold score, move to next x
if(score < scoreThresh):
continue
# Calculate offset
offsetX = x * 4.0
offsetY = y * 4.0
angle = anglesData[x]
# Calculate cos and sin of angle
cosA = math.cos(angle)
sinA = math.sin(angle)
h = x0_data[x] + x2_data[x]
w = x1_data[x] + x3_data[x]
# Calculate offset
offset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])
# Find points for rectangle
p1 = (-sinA * h + offset[0], -cosA * h + offset[1])
p3 = (-cosA * w + offset[0], sinA * w + offset[1])
center = (0.5*(p1[0]+p3[0]), 0.5*(p1[1]+p3[1]))
detections.append((center, (w,h), -1*angle * 180.0 / math.pi))
confidences.append(float(score))
# Return detections and confidences
return [detections, confidences]
【问题讨论】:
-
那么,您的具体问题是什么?
-
基本上为什么解码函数会做它的工作?
标签: python opencv deep-learning