备份和恢复一个elasticsearch索引的一些记录答案

【问题标题】：Backup and restore some records of an elasticsearch index备份和恢复一个elasticsearch索引的一些记录
【发布时间】：2019-10-14 21:25:41
【问题描述】：

我希望备份 Elasticsearch 索引的一些记录（例如仅最新的 100 万条记录）并在另一台机器上恢复此备份。如果这可以使用可用/内置的 Elasticsearch 功能来完成，那就更好了。

我尝试过 Elasticsearch 快照和恢复（以下代码），但看起来它需要备份整个索引，而不是选择性记录。

    curl -H 'Content-Type: application/json'  -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
    {
      "type": "fs",
      "settings": {
        "compress" : true,
        "location": "es_data_dump"
      }
    }'

    curl -H 'Content-Type: application/json'  -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
    {
      "indices" : "index_name",
      "type": "fs",
      "settings": {
        "compress" : true,
        "location": "es_data_dump"
      }
    }'

备份的格式可以是任何格式，只要它可以在不同的机器上成功恢复即可。

【问题讨论】：

标签： elasticsearch

【解决方案1】：

您可以使用 _reinex API。它可以接受任何查询。重新索引后，您有一个新索引作为备份，其中包含请求的记录。轻松将其复制到您想要的任何位置。

完整信息在这里：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

【讨论】：

【解决方案2】：

最后，我使用 python 驱动程序获取了所需的数据，因为对于给定的用例，这是我发现的最简单的方法。

为此，我运行了一个 Elasticsearch 查询并将其响应以换行符分隔格式存储在一个文件中，然后我使用另一个 python 脚本从中恢复了数据。以这种方式最多返回 10000 个条目以及用于获取下一个 10000 个条目的滚动 ID，依此类推。

es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
    es_data = page['hits']['hits'] #Store this as you like
    scrollId = page['_scroll_id']
    page = es.scroll(scroll_id=scrollId, scroll='5m')

【讨论】：