【问题标题】:Bulk index document from JSON file into ElasticSearch将 JSON 文件中的批量索引文档导入 ElasticSearch
【发布时间】:2017-12-22 04:55:26
【问题描述】:

我有一个 sample.json 如下:

{"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"},
{"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"},
{"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"},
{"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"},
{"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"},
{"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}

我应该使用什么命令将每一行索引到 ElasticSearch 中? 到目前为止,我已经尝试了以下方法,但它不起作用。

>> curl -XGET 'localhost:9200/car/car' -d @sample.json 
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

也试过了:

curl -XGET 'localhost:9200/car/inventory/_bulk' -H 'Content-Type: application/json' -d @sample.json 
{"_index":"car","_type":"inventory","_id":"_bulk","found":false}

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    你会想要使用Bulk API

    文档很好地解释了所有内容,但请注意以下事项:

    • 您的文件应该是换行符分隔的 json (NDJSON),其中 application/x-ndjson 指定为 Content-Type。这意味着末尾没有逗号。
    • 每条记录有 2 行,一个“操作/元数据”行,然后是源 json 行
    • 您的文件必须以换行符结尾
    • 使用 curl 时,请务必使用 --data-binary,以便保留换行符
    • URL 路径不需要指定索引或类型,只需_bulk,但是您必须在每条记录的元数据行中包含索引和类型。如果在 url 中指定索引和类型,则元数据不需要包含 _index_type 字段。

    以您的为例,您的文件将如下所示:

    { "index" : { "_index" : "car", "_type" : "car", "_id" : "921" } }
    {"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "922" } }
    {"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "923" } }
    {"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "924" } }
    {"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "925" } }
    {"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "926" } }
    {"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}
    
    

    当然,curl 命令会将Content-Type 标头指定为application/x-ndjson,如下所示:

    curl -XPOST -H "Content-Type: application/x-ndjson" localhost:9200/_bulk --data-binary @sample.json 
    

    【讨论】:

    • _index_type 如果在 URL /car/car/_bulk 中,则不需要为每个文档指定。
    猜你喜欢
    • 1970-01-01
    • 2013-04-02
    • 1970-01-01
    • 1970-01-01
    • 2016-08-11
    • 2021-07-26
    • 1970-01-01
    • 2021-12-27
    相关资源
    最近更新 更多