【问题标题】:ElasticSearch constraints when fetching more than 50k documents using java api使用 java api 获取超过 50k 文档时的 ElasticSearch 约束
【发布时间】:2018-07-12 15:51:50
【问题描述】:

我正在使用 java api SearchSourceBuilder 查询弹性搜索索引。我的索引中有超过 100k 的文档,并且如果我尝试获取 120k 文档,我已经将 index.max_result_window 增加到 120000 然后从我的 java 代码中。它在下面的行中抛出空指针异常。

SearchHit[] searchHits = searchResponse.getHits().getHits();

如果我将SearchSourceBuilder 的大小减小到50k,那么它可以正常工作,但我只能获取50k 文档。

请在下面找到我的代码:

RestHighLevelClient restHighLevelClient = null;
    Document doc=new Document();

    logger.info("Started Indexing the Document.....");

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));
        System.out.println(e.getMessage());
    }


    //Fetching Id, FilePath & FileName from Document Index. 
    SearchRequest searchRequest = new SearchRequest(INDEX); 
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryBuilder qb = QueryBuilders.matchAllQuery();
    searchSourceBuilder.query(qb);
    searchSourceBuilder.size(120000); 
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = restHighLevelClient.search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }

    SearchHit[] searchHits = searchResponse.getHits().getHits(); /// Getting null pointer exception after porcessing some documents. Count is not very constant.
    long totalHits=searchResponse.getHits().totalHits;
    logger.info("Total Hits --->"+totalHits);

请查看我的索引设置详情

{
  "document_attachment": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "document_attachment",
        "max_result_window": "150000",
        "creation_date": "1531402811016",
        "analysis": {
          "analyzer": {
            "custom_analyzer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "char_filter": [
                "html_strip"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "product_catalog_keywords_analyzer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "char_filter": [
                "html_strip"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "UBRQAkg-Su-FfeAtBTGFIw",
        "version": {
          "created": "6020399"
        }
      }
    }
  }
}

【问题讨论】:

    标签: java elasticsearch elastic-stack


    【解决方案1】:

    您需要使用滚动搜索,而不是尝试一次获取所有内容。这允许您对结果进行分页。

    通过滚动,您可以获得尽可能多的结果;没有上限。您将无法获得排名结果 t,但这对于这么大的结果集来说毫无意义。

    请参阅documentation 了解如何执行此操作。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-09
      • 2017-03-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多