【发布时间】:2021-06-19 16:31:44
【问题描述】:
来自文本documentation:Documents for synchronous operations can be in PNG or JPEG format. Documents for asynchronous operations can also be in PDF format.
我有一个 Node.js 应用程序,我在其中使用异步 Textract 读取 PDF 文件。我的代码如下所示:
import * as AWS from 'aws-sdk';
const textract = new AWS.Textract({ region: '<REGION>' });
export const callTextract = (file: File, uuid: string): Promise<any> => {
return new Promise<any>((resolve, reject) => {
const params = {
Document: {
Bytes: file,
},
};
textract.detectDocumentText(params, (err, data) => {
....
resolve(data);
});
})
}
这里的文件已经从操作系统中读取,是Buffer格式。由于前 4 个字节(Detecting file type from buffer in node js?),我可以确认它是 PDF 文件:
<Buffer 25 50 44 46 ... >
我收到的错误是UnsupportedDocumentException。
【问题讨论】:
标签: node.js typescript amazon-web-services pdf amazon-textract