通过 KIBANA 获取 PDF答案

【问题标题】：PDF ingesting through KIBANA通过 KIBANA 获取 PDF
【发布时间】：2021-08-18 12:53:46
【问题描述】：

我是 Elasticsearch 的新手，有一些要求我需要使用 Kibana 摄取和索引 pdf。我发现我们必须为上述目的创建一个管道，但不知道要使用哪个处理器以及我应该如何配置它们。我发现我的 Elasticsearch 的节点安装了摄取附件插件。我使用的版本是 Elasticsearch 7.14，因此感谢您提供任何帮助。

【问题讨论】：

标签： elasticsearch kibana

【解决方案1】：

这可能对您有用，摄取附件处理器插件使用 base64 作为 pdf 提取和摄取数据。您需要将 base64 abd 摄取到管道中。例如：

encoded_data = base64.b64encode(data).decode('utf-8') # data is the file that you are parsing

body = {    
        'query': { 
            'bool': {
                "filter": [
                    {"ids": { 'values': [contentDocumentId]}},
                    {"term": {"contentVersionId": contentVersionId}}
                ]
            }
        },
        'script': {
            'source': 'ctx._source["file_data"] = params._file_data',
            'params': {'_file_data': encoded_data}
        }
    }
    response = client.update_by_query(conflicts='proceed', index=_index, pipeline='attachment', body=json.dumps(body))

我正在使用 update by query 作为我的用例，您可以检查是否要使用 update 或 update by query

【讨论】：

首先，我应该使用哪个处理器来创建用于 pdf 摄取的管道？有许多可用的处理器，例如 append、enrich。解剖等
有专门用于此用例的摄取附件处理器插件。上面提到的代码 sn-p 就是同一个插件的例子。 elastic.co/guide/en/elasticsearch/plugins/master/…