Json 通过 API 到 Elasticsearch答案

【问题标题】：Json to Elasticsearch via APIJson 通过 API 到 Elasticsearch
【发布时间】：2019-01-08 18:30:07
【问题描述】：

我正在尝试将一个 json 文件添加到具有大约 30.000 行并且格式不正确的 elasticsearch。我正在尝试通过 Bulk API 上传它，但我找不到正确格式化它的方法，它确实有效。我正在使用 Ubuntu 16.04LTS。

这是json的格式：

{
    "rt": "2018-11-20T12:57:32.292Z",
    "source_info": { "ip": "0.0.60.50" },
    "end": "2018-11-20T12:57:32.284Z",
    "severity": "low",
    "duid": "5b8d0a48ba59941314e8a97f",
    "dhost": "004678",
    "endpoint_type": "computer",
    "endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
    "suser": "Katerina",
    "group": "PERIPHERALS",
    "customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
    "type": "Event::Endpoint::Device::AlertedOnly",
    "id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
    "name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

我知道批量 API 的格式在文件中的每个 json 对象之前需要{"index":{"_id":*}}，它看起来像这样：

{"index":{"_id":1}}

{
    "rt": "2018-11-20T12:57:32.292Z",
    "source_info": { "ip": "0.0.60.50" },
    "end": "2018-11-20T12:57:32.284Z",
    "severity": "low",
    "duid": "5b8d0a48ba59941314e8a97f",
    "dhost": "004678",
    "endpoint_type": "computer",
    "endpoint_id": "8e7e2806-eaee-9436-6ab5-078361576290",
    "suser": "Katerina",
    "group": "PERIPHERALS",
    "customer_id": "a263f4c8-942f-d4f4-5938-7c37013c03be",
    "type": "Event::Endpoint::Device::AlertedOnly",
    "id": "83d63d48-f040-2485-49b9-b4ff2ac4fad4",
    "name": "Peripheral allowed: Samsung Galaxy S7 edge"
}

如果我手动插入索引 id，然后使用这个表达式 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:92100/ivc/default/bulk?pretty --data-binary @results.json 它将上传它而不会出错。

我的问题是，如何将索引 ID {"index":{"_id":*}} 添加到 json 的每一行以使其准备好上传？显然索引 id 必须每行添加 +1，有没有办法从 CLI 中做到这一点？

对不起，如果这篇文章看起来不正常，我在 Stack Overflow 上阅读了数百万篇文章，但这是我的第一篇！ #绝望

提前非常感谢您！

【问题讨论】：

这个答案可能会有所帮助：stackoverflow.com/a/45604500/4604579
不知道你能不能用 CLI 做到这一点，但看看 logstash，应该很快。
谢谢你们！我要试试 Val 的解决方案！
不幸的是，这个带有 jq 的解决方案不起作用。 :( 它会在每个 json 对象的每个字段之后放置一个“索引”。我希望在每个整个 json 对象之后都有一个“索引”，因此 Bulk API 会接受它。显然它不采用这种格式。：/

标签： json elasticsearch

【解决方案1】：

您的问题是 Elasticsearch 期望文档是 ONE 行上的有效 json，如下所示：

{"index":{"_id":1}}
{"rt":"2018-11-20T12:57:32.292Z","source_info":{"ip":"0.0.60.50"},"end":"2018-11-20T12:57:32.284Z","severity":"low","duid":"5b8d0a48ba59941314e8a97f","dhost":"004678","endpoint_type":"computer","endpoint_id":"8e7e2806-eaee-9436-6ab5-078361576290","suser":"Katerina","group":"PERIPHERALS","customer_id":"a263f4c8-942f-d4f4-5938-7c37013c03be","type":"Event::Endpoint::Device::AlertedOnly","id":"83d63d48-f040-2485-49b9-b4ff2ac4fad4","name":"Peripheral allowed: Samsung Galaxy S7 edge"}

您必须找到一种方法来转换您的输入文件，以便每行都有一个文档，然后您会很好地使用 Val 的解决方案。

【讨论】：

感谢您的回复克里斯托夫！事实上，当我在 Pluma 中打开文件时，它说只显示两行，第一行是索引，第二行是 json 的主体，仅在一行中。对不起，我很迷茫

【解决方案2】：

感谢您提供的所有答案，他们确实帮助我朝着正确的方向前进。

我制作了一个 bash 脚本来自动下载、格式化日志并将其上传到 Elasticsearch：

#!/bin/bash

echo "Downloading logs from Sophos Central. Please wait."

cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

#This deletes the last batch of results
rm result.json
cd .. 

#This triggers the script to download a new batch of logs from Sophos

./siem.py
cd /home/user/ELK/Sophos-Central-SIEM-Integration/log

#Adds newline at the beginning of the logs file
sed -i '1 i\{"index":{}}' result.json

#Adds indexes
sed -i '3~2s/^/{"index":{}}/' result.json

#Adds json file to elasticsearch 
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/ivc/default/_bulk?pretty --data-binary @result.json

这就是我实现这一目标的方式。可能有更简单的选择，但这个对我有用。希望对其他人有用！

再次感谢大家！ :D

【讨论】：