【问题标题】:Get correct image orientation by Google Cloud Vision api (TEXT_DETECTION)通过 Google Cloud Vision api (TEXT_DETECTION) 获取正确的图像方向
【发布时间】:2017-05-08 05:19:03
【问题描述】:

我在 90 度旋转图像上尝试了 Google Cloud Vision api (TEXT_DETECTION)。它仍然可以正确返回已识别的文本。 (见下图)

这意味着即使图像旋转 90、180、270 度,引擎也可以识别文本。

但是响应结果不包含正确图像方向的信息。 (文档:EntityAnnotation

有没有办法不仅获得可识别的文本,还获得方向
谷歌可以支持它类似于 (FaceAnnotation: getRollAngle)

【问题讨论】:

标签: ocr google-cloud-platform google-cloud-vision


【解决方案1】:

您可以利用我们知道单词中字符序列的事实来推断单词的方向,如下所示(对于非 LTR 语言,逻辑显然略有不同):

for page in annotation:
    for block in page.blocks:
        for paragraph in block.paragraphs:
            for word in paragraph.words:
                if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE:
                    continue
                first_char = word.symbols[0]
                last_char = word.symbols[-1]
                first_char_center = (np.mean([v.x for v in first_char.bounding_box.vertices]),np.mean([v.y for v in first_char.bounding_box.vertices]))
                last_char_center = (np.mean([v.x for v in last_char.bounding_box.vertices]),np.mean([v.y for v in last_char.bounding_box.vertices]))

                #upright or upside down
                if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y): 
                    if first_char_center[0] <= last_char_center[0]: #upright
                        print 0
                    else: #updside down
                        print 180
                else: #sideways
                    if first_char_center[1] <= last_char_center[1]:
                        print 90
                    else:
                        print 270

然后您可以使用单个单词的方向来推断整个文档的方向。

【讨论】:

  • MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE 使用什么值?
【解决方案2】:

Public Issue Tracker 中所述,我们的工程团队现在已了解此功能请求,并且目前没有针对其实施的 ETA。

请注意,您的图片元数据中可能已经包含方向信息。在Third-party library 中可以看到如何提取元数据的示例。

一种广泛的解决方法是检查返回的“boundingPoly”“顶点”以查找返回的“textAnnotations”。通过计算每个检测到的单词的矩形的宽度和高度,如果矩形“高度”>“宽度”(也就是图像是横向的),您可以确定图像是否不是正面朝上的。

【讨论】:

  • 很想知道谷歌云视觉如何从图像中获取正确的文本,即使图像没有水平对齐并且需要旋转。 API 是如何知道图像要旋转多少的?如果该信息不在图像的元数据中,云 API 是如何找到的?
  • 您已解决的问题,如 cloud.google.com/vision/docs/reference/rest/v1p4beta1/… 中所述,仅允许您区分旋转 0、90、180、270 度(并且仅在经过一些数学运算之后)。您已经掌握了校正图像所需的所有信息,为什么不返回它呢?
【解决方案3】:

我发布了我的解决方法,它确实适用于旋转 90、180、270 度的图像。请看下面的代码。

GetExifOrientation(annotateImageResponse.getTextAnnotations().get(1));
/**
 *
 * @param ea  The input EntityAnnotation must be NOT from the first EntityAnnotation of
 *            annotateImageResponse.getTextAnnotations(), because it is not affected by
 *            image orientation.
 * @return Exif orientation (1 or 3 or 6 or 8)
 */
public static int GetExifOrientation(EntityAnnotation ea) {
    List<Vertex> vertexList = ea.getBoundingPoly().getVertices();
    // Calculate the center
    float centerX = 0, centerY = 0;
    for (int i = 0; i < 4; i++) {
        centerX += vertexList.get(i).getX();
        centerY += vertexList.get(i).getY();
    }
    centerX /= 4;
    centerY /= 4;

    int x0 = vertexList.get(0).getX();
    int y0 = vertexList.get(0).getY();

    if (x0 < centerX) {
        if (y0 < centerY) {
            //       0 -------- 1
            //       |          |
            //       3 -------- 2
            return EXIF_ORIENTATION_NORMAL; // 1
        } else {
            //       1 -------- 2
            //       |          |
            //       0 -------- 3
            return EXIF_ORIENTATION_270_DEGREE; // 6
        }
    } else {
        if (y0 < centerY) {
            //       3 -------- 0
            //       |          |
            //       2 -------- 1
            return EXIF_ORIENTATION_90_DEGREE; // 8
        } else {
            //       2 -------- 3
            //       |          |
            //       1 -------- 0
            return EXIF_ORIENTATION_180_DEGREE; // 3
        }
    }
}

更多信息
我发现我必须添加语言提示才能使 annotateImageResponse.getTextAnnotations().get(1) 始终遵守规则。

添加语言提示的示例代码

ImageContext imageContext = new ImageContext();
String [] languages = { "zh-TW" };
imageContext.setLanguageHints(Arrays.asList(languages));
annotateImageRequest.setImageContext(imageContext);

【讨论】:

    【解决方案4】:

    杰克范的回答对我有用。这是我的 VanillaJS 版本。

    /**
     *
     * @param gOCR  The Google Vision response
     * @return orientation (0, 90, 180 or 270)
     */
    function getOrientation(gOCR) {
        var vertexList = gOCR.responses[0].textAnnotations[1].boundingPoly.vertices;
    
        const ORIENTATION_NORMAL = 0;
        const ORIENTATION_270_DEGREE = 270;
        const ORIENTATION_90_DEGREE = 90;
        const ORIENTATION_180_DEGREE = 180;
    
        var centerX = 0, centerY = 0;
        for (var i = 0; i < 4; i++) {
            centerX += vertexList[i].x;
            centerY += vertexList[i].y;
        }
        centerX /= 4;
        centerY /= 4;
    
        var x0 = vertexList[0].x;
        var y0 = vertexList[0].y;
    
        if (x0 < centerX) {
            if (y0 < centerY) {
    
                return ORIENTATION_NORMAL;
            } else {
                return ORIENTATION_270_DEGREE;
            }
        } else {
            if (y0 < centerY) {
                return ORIENTATION_90_DEGREE;
            } else {
                return ORIENTATION_180_DEGREE;
            }
        }
    }
    

    【讨论】:

      【解决方案5】:

      有时无法从元数据中获取方向。例如,如果用户使用方向错误的移动设备的相机拍摄照片。 我的解决方案基于 Jack Fan 的回答和 google-api-services-vision(可通过 Maven 获得)。

      我的 TextUnit 类

        public class TextUnit {
              private String text;
      
              //    X of lowest left point
              private float llx;
      
              //    Y of lowest left point
              private float lly;
      
              //    X of upper right point
              private float urx;
      
              //    Y of upper right point
              private float ury;
          }
      

      基本方法:

       List<TextUnit> extractData(BatchAnnotateImagesResponse response) throws AnnotateImageResponseException {
                  List<TextUnit> data = new ArrayList<>();
      
                  for (AnnotateImageResponse res : response.getResponses()) {
                      if (null != res.getError()) {
                          String errorMessage = res.getError().getMessage();
                          logger.log(Level.WARNING, "AnnotateImageResponse ERROR: " + errorMessage);
                          throw new AnnotateImageResponseException("AnnotateImageResponse ERROR: " + errorMessage);
                      } else {
                          List<EntityAnnotation> texts = response.getResponses().get(0).getTextAnnotations();
                          if (texts.size() > 0) {
      
                              //get orientation
                              EntityAnnotation first_word = texts.get(1);
                              int orientation;
                              try {
                                  orientation = getExifOrientation(first_word);
                              } catch (NullPointerException e) {
                                  try {
                                      orientation = getExifOrientation(texts.get(2));
                                  } catch (NullPointerException e1) {
                                      orientation = EXIF_ORIENTATION_NORMAL;
                                  }
                              }
                              logger.log(Level.INFO, "orientation: " + orientation);
      
                              // Calculate the center
                              float centerX = 0, centerY = 0;
                              for (Vertex vertex : first_word.getBoundingPoly().getVertices()) {
                                  if (vertex.getX() != null) {
                                      centerX += vertex.getX();
                                  }
                                  if (vertex.getY() != null) {
                                      centerY += vertex.getY();
                                  }
                              }
                              centerX /= 4;
                              centerY /= 4;
      
      
                              for (int i = 1; i < texts.size(); i++) {//exclude first text - it contains all text of the page
      
                                  String blockText = texts.get(i).getDescription();
                                  BoundingPoly poly = texts.get(i).getBoundingPoly();
      
                                  try {
                                      float llx = 0;
                                      float lly = 0;
                                      float urx = 0;
                                      float ury = 0;
                                      if (orientation == EXIF_ORIENTATION_NORMAL) {
                                          poly = invertSymmetricallyBy0X(centerY, poly);
                                          llx = getLlx(poly);
                                          lly = getLly(poly);
                                          urx = getUrx(poly);
                                          ury = getUry(poly);
                                      } else if (orientation == EXIF_ORIENTATION_90_DEGREE) {
                                          //invert by x
                                          poly = rotate(centerX, centerY, poly, Math.toRadians(-90));
                                          poly = invertSymmetricallyBy0Y(centerX, poly);
                                          llx = getLlx(poly);
                                          lly = getLly(poly);
                                          urx = getUrx(poly);
                                          ury = getUry(poly);
                                      } else if (orientation == EXIF_ORIENTATION_180_DEGREE) {
                                          poly = rotate(centerX, centerY, poly, Math.toRadians(-180));
                                          poly = invertSymmetricallyBy0Y(centerX, poly);
                                          llx = getLlx(poly);
                                          lly = getLly(poly);
                                          urx = getUrx(poly);
                                          ury = getUry(poly);
                                      }else if (orientation == EXIF_ORIENTATION_270_DEGREE){
                                          //invert by x
                                          poly = rotate(centerX, centerY, poly, Math.toRadians(-270));
                                          poly = invertSymmetricallyBy0Y(centerX, poly);
                                          llx = getLlx(poly);
                                          lly = getLly(poly);
                                          urx = getUrx(poly);
                                          ury = getUry(poly);
                                      }
      
      
                                      data.add(new TextUnit(blockText, llx, lly, urx, ury));
                                  } catch (NullPointerException e) {
                                      //ignore - some polys has not X or Y coordinate if text located closed to bounds.
                                  }
                              }
                          }
                      }
                  }
                  return data;
              }
      

      辅助方法:

      private float getLlx(BoundingPoly poly) {
              try {
                  List<Vertex> vertices = poly.getVertices();
      
                  ArrayList<Float> xs = new ArrayList<>();
                  for (Vertex v : vertices) {
                      float x = 0;
                      if (v.getX() != null) {
                          x = v.getX();
                      }
                      xs.add(x);
                  }
      
                  Collections.sort(xs);
                  float llx = (xs.get(0) + xs.get(1)) / 2;
                  return llx;
              } catch (Exception e) {
                  return 0;
              }
          }
      
          private float getLly(BoundingPoly poly) {
              try {
                  List<Vertex> vertices = poly.getVertices();
      
                  ArrayList<Float> ys = new ArrayList<>();
                  for (Vertex v : vertices) {
                      float y = 0;
                      if (v.getY() != null) {
                          y = v.getY();
                      }
                      ys.add(y);
                  }
      
                  Collections.sort(ys);
                  float lly = (ys.get(0) + ys.get(1)) / 2;
                  return lly;
              } catch (Exception e) {
                  return 0;
              }
          }
      
          private float getUrx(BoundingPoly poly) {
              try {
                  List<Vertex> vertices = poly.getVertices();
      
                  ArrayList<Float> xs = new ArrayList<>();
                  for (Vertex v : vertices) {
                      float x = 0;
                      if (v.getX() != null) {
                          x = v.getX();
                      }
                      xs.add(x);
                  }
      
                  Collections.sort(xs);
                  float urx = (xs.get(xs.size()-1) + xs.get(xs.size()-2)) / 2;
                  return urx;
              } catch (Exception e) {
                  return 0;
              }
          }
      
          private float getUry(BoundingPoly poly) {
              try {
                  List<Vertex> vertices = poly.getVertices();
      
                  ArrayList<Float> ys = new ArrayList<>();
                  for (Vertex v : vertices) {
                      float y = 0;
                      if (v.getY() != null) {
                          y = v.getY();
                      }
                      ys.add(y);
                  }
      
                  Collections.sort(ys);
                  float ury = (ys.get(ys.size()-1) +ys.get(ys.size()-2)) / 2;
                  return ury;
              } catch (Exception e) {
                  return 0;
              }
          }
      
          /**
           * rotate rectangular clockwise
           *
           * @param poly
           * @param theta the angle of rotation in radians
           * @return
           */
          public BoundingPoly rotate(float centerX, float centerY, BoundingPoly poly, double theta) {
      
              List<Vertex> vertexList = poly.getVertices();
      
              //rotate all vertices in poly
              for (Vertex vertex : vertexList) {
                  float tempX = vertex.getX() - centerX;
                  float tempY = vertex.getY() - centerY;
      
                  // now apply rotation
                  float rotatedX = (float) (centerX - tempX * cos(theta) + tempY * sin(theta));
                  float rotatedY = (float) (centerX - tempX * sin(theta) - tempY * cos(theta));
      
                  vertex.setX((int) rotatedX);
                  vertex.setY((int) rotatedY);
              }
              return poly;
          }
      
          /**
           * since Google Vision Api returns boundingPoly-s when Coordinates starts from top left corner,
           * but Itext uses coordinate system with bottom left start position -
           * we need invert the result for continue to work with itext.
           *
           * @return text units inverted symmetrically by 0X coordinates.
           */
          private BoundingPoly invertSymmetricallyBy0X(float centerY, BoundingPoly poly) {
      
              List<Vertex> vertices = poly.getVertices();
              for (Vertex v : vertices) {
                  if (v.getY() != null) {
                      v.setY((int) (centerY + (centerY - v.getY())));
                  }
              }
              return poly;
          }
      
          /**
           *
           * @param centerX
           * @param poly
           * @return  text units inverted symmetrically by 0Y coordinates.
           */
          private BoundingPoly invertSymmetricallyBy0Y(float centerX, BoundingPoly poly) {
              List<Vertex> vertices = poly.getVertices();
              for (Vertex v : vertices) {
                  if (v.getX() != null) {
                      v.setX((int) (centerX + (centerX - v.getX())));
                  }
              }
              return poly;
          }
      

      【讨论】:

        【解决方案6】:

        通常我们需要知道照片中文字的实际旋转角度。 API中提供的坐标信息已经足够完整了。只需计算 xy1 和 xy0 之间的角度即可得到旋转角度。

        // reset
        self.transform = CGAffineTransformIdentity;
        
        CGFloat x_0 = viewData.bounds[0].x;
        CGFloat y_0 = viewData.bounds[0].y;
        
        CGFloat x_1 = viewData.bounds[1].x;
        CGFloat y_1 = viewData.bounds[1].y;
        
        CGFloat x_3 = viewData.bounds[3].x;
        CGFloat y_3 = viewData.bounds[3].y;
        
        // distance
        CGFloat width = sqrt(pow(x_0 - x_1, 2) + pow(y_0 - y_1, 2));
        CGFloat height = sqrt(pow(x_0 - x_3, 2) + pow(y_0 - y_3, 2));
        self.size = CGSizeMake(width, height);
        
        // angle
        CGFloat angle = atan2((y_1 - y_0), (x_1 - x_0));
        // rotation
        self.transform = CGAffineTransformRotate(CGAffineTransformIdentity, angle);
        

        【讨论】:

        • 您好,答案应为英文。我通过在线翻译为您翻译了它 - 请确认它仍然有意义。
        【解决方案7】:

        v1 REST 端点的响应中已经包含 orientationDegrees

        https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#Page

        很遗憾,google-cloud-vision 3.2.0 还没有这个https://github.com/googleapis/python-vision/issues/156

        【讨论】:

          猜你喜欢
          • 2017-07-07
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-03-20
          相关资源
          最近更新 更多