【问题标题】:Easticsearch reindexing multi-type parent/child index(v5.0) to join type index(v6.2)Elasticsearch索引多类型父/子索引(v5.0)加入类型索引(v6.2)
【发布时间】:2018-08-28 03:17:40
【问题描述】:

我正在将我的索引数据从 ES 5.0(父子)重新索引到 ES 6.2(连接类型)

索引 ES 5.0 中的数据以不同类型存储为父子文档,对于重新索引,我在新集群中创建了基于 6.2 的新索引/映射。

父文档完美地重新索引到新索引,但子文档抛出如下错误

{
  "index": "index_two",
  "type": "_doc",
  "id": "AVpisCkMuwDYFnQZiFXl",
  "cause": {
    "type": "mapper_parsing_exception",
    "reason": "failed to parse",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "[routing] is missing for join field [field_relationship]"
    }
  },
  "status": 400
}

我用来重新索引数据的脚本

  {
  "source": {
    "remote": {
      "host": "http://myescluster.com:9200",
      "socket_timeout": "1m",
      "connect_timeout": "20s"
    },
    "index": "index_two",
    "type": ["actions"],
    "size": 5000,
    "query":{
        "bool":{
            "must":[
                {"term": {"client_id.raw": "cl14ous0ydao"}}
            ]
        }
    }
  },
  "dest": {
    "index": "index_two",
    "type": "_doc"
  },
  "script": {
    "params": {
        "jdata": {
            "name": "actions"
        }
    },
    "source": "ctx._routing=ctx._routing;ctx.remove('_parent');params.jdata.parent=ctx._source.user_id;ctx._source.field_relationship=params.jdata"
  }
}

我已经在无痛脚本中传递了路由字段,因为文档是来自源索引的动态文档。

目标索引的映射

{
  "index_two": {
    "mappings": {
      "_doc": {
        "dynamic_templates": [
          {
            "template_actions": {
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "index": true,
                    "ignore_above": 256,
                    "type": "keyword"
                  }
                },
                "type": "text"
              }
            }
          }
        ],
        "date_detection": false,
        "properties": {
          "attributes": {
            "type": "nested"
          }
        },
        "cl_other_params": {
          "type": "nested"
        },
        "cl_triggered_ts": {
          "type": "date"
        },
        "cl_utm_params": {
          "type": "nested"
        },
        "end_ts": {
          "type": "date"
        },
        "field_relationship": {
          "type": "join",
          "eager_global_ordinals": true,
          "relations": {
            "users": [
              "actions",
              "segments"
            ]
          }
        },
        "ip_address": {
          "type": "ip"
        },
        "location": {
          "type": "geo_point"
        },
        "processed_ts": {
          "type": "date"
        },
        "processing_time": {
          "type": "date"
        },
        "products": {
          "type": "nested",
          "properties": {
            "traits": {
              "type": "nested"
            }
          }
        },
        "segment_id": {
          "type": "integer"
        },
        "start_ts": {
          "type": "date"
        }
      }
    }
  }
}

我的示例源文档

    {
    "_index": "index_two",
    "_type": "actions",
    "_id": "AVvKUYcceQCc2OyLKWZ9",
    "_score": 7.4023576,
    "_routing": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
    "_parent": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
    "_source": {
      "user_id": "cl14ous0ydaob71ab2a1-837c-4904-a755-11e13410fb94",
      "client_id": "cl14ous0ydao",
      "session_id": "CL-e0ec3941-6dad-4d2d-bc9b",
      "source": "betalist",
      "action": "pageview",
      "action_type": "pageview",
      "device": "Desktop",
      "ip_address": "49.35.14.224",
      "location": "20.7333 , 77",
      "attributes": [
        {
          "key": "url",
          "value": "https://www.google.com/",
          "type": "string"
        }
      ],
      "products": []
    }
  }

【问题讨论】:

  • 我最终使用批量 api 来传输数据。重新索引 api 不走运
  • 介意分享批量请求吗?谢谢

标签: elasticsearch elasticsearch-5 elasticsearch-painless


【解决方案1】:

我遇到了同样的问题,在 elasticsearch 讨论中搜索我发现 this 有效:

POST_reindex

{
    "source": {
        "index": "old_index",
        "type": "actions"
    },
    "dest": {
        "index": "index_two"
    },
    "script": {
        "source": """

            ctx._type = "_doc";

            String  routingCode = ctx._source.user_id;
            Map join = new HashMap();
            join.put('name', 'actions');
            join.put('parent', routingCode);

            ctx._source.put('field_relationship', join);

            ctx._parent = null;

            ctx._routing = new StringBuffer(routingCode)"""
    }
}

希望这会有所帮助:)。

【讨论】:

    【解决方案2】:

    我想指出,连接字段通常不需要路由,但是如果您在创建父项之前创建子项,那么您将面临这个问题。

    建议先重新索引所有的父母,然后是孩子。

    【讨论】:

      猜你喜欢
      • 2016-12-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-21
      相关资源
      最近更新 更多