我正在使用 aws textract StartDocumentTextDetectionCommand 和 GetDocumentTextDetectionCommand。我只想返回行，而不是单个单词答案

【问题标题】：I am using aws textract StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. I want only lines to be returned, not the single words我正在使用 aws textract StartDocumentTextDetectionCommand 和 GetDocumentTextDetectionCommand。我只想返回行，而不是单个单词
【发布时间】：2022-09-24 00:14:25
【问题描述】：

我正在使用 aws textract 和 nodejs 创建一个 OCR 内部工具来检测扫描的 pdf 中的文本，特别是 StartDocumentTextDetectionCommand 和 GetDocumentTextDetectionCommand。当前在块对象列表中返回，首先带有行，然后开始逐字检测每个单词。有什么方法可以让我添加一个参数或其他东西，它只会为我返回行，而不是在 pdf 中逐字返回。

标签： amazon-web-services ocr text-extraction amazon-textract

【解决方案1】：

不，这是不可能的。有多种块类型，行通过关系链接到单词。

为什么不能简单地只选择您感兴趣的块类型（行）有什么原因吗？

【讨论】：