【问题标题】:Lucene: delete from index, based on multiple fieldsLucene:从索引中删除,基于多个字段
【发布时间】:2011-01-31 12:46:18
【问题描述】:

我需要从 lucene 搜索索引中删除文档。标准方法:

indexReader.deleteDocuments(new Term("field_name", "field value"));

不会成功:我需要根据多个字段执行删除。我需要这样的东西:

(pseudo code)
TermAggregator terms = new TermAggregator();
terms.add(new Term("field_name1", "field value 1"));
terms.add(new Term("field_name2", "field value 2"));
indexReader.deleteDocuments(terms.toTerm());

是否有任何构造?

【问题讨论】:

    标签: java lucene


    【解决方案1】:

    IndexWriter 具有允许更强大删除的方法,例如IndexWriter.deleteDocuments(Query)。您可以构建一个 BooleanQuery 与您希望删除的术语的结合,并使用它。

    【讨论】:

    • @Avi,您知道基于多个字段更新文档的方法吗? updateDocument() 方法只接受一个 Term 作为它的第一个参数。
    【解决方案2】:

    分析仪的选择

    首先,请注意您使用的是哪种分析仪。我被难住了一会儿才意识到 StandardAnalyzer 过滤掉了像“the”和“a”这样的常用词。当您的字段具有值“A”时,这是一个问题。您可能需要考虑 KeywordAnalyzer:

    See this post around the analyzer.

    // Create an analyzer:
    // NOTE: We want the keyword analyzer so that it doesn't strip or alter any terms:
    // In our example, the Standard Analyzer removes the term 'A' because it is a common English word.
    // https://stackoverflow.com/a/9071806/231860
    KeywordAnalyzer analyzer = new KeywordAnalyzer();
    

    查询解析器

    接下来,您可以使用 QueryParser 创建查询:

    See this post around overriding the default operator.

    // Create a query parser without a default field in this example (the first argument):
    QueryParser queryParser = new QueryParser("", analyzer);
    
    // Optionally, set the default operator to be AND (we leave it the default OR):
    // https://stackoverflow.com/a/9084178/231860
    // queryParser.setDefaultOperator(QueryParser.Operator.AND);
    
    // Parse the query:
    Query multiTermQuery = queryParser.parse("field_name1:\"field value 1\" AND field_name2:\"field value 2\"");
    

    查询接口

    或者您也可以使用他们的 API 自己构建查询来达到同样的目的:

    See this tutorial around creating the BooleanQuery.

    BooleanQuery multiTermQuery = new BooleanQuery();
    multiTermQuery.add(new TermQuery(new Term("field_name1", "field value 1")), BooleanClause.Occur.MUST);
    multiTermQuery.add(new TermQuery(new Term("field_name2", "field value 2")), BooleanClause.Occur.MUST);
    

    数字字段查询(Int 等...)

    当关键字段是数字时,您不能使用 TermQuery,而必须使用 NumericRangeQuery。

    See the answer to this question.

    // NOTE: For IntFields, we need NumericRangeQueries:
    // https://stackoverflow.com/a/14076439/231860
    BooleanQuery multiTermQuery = new BooleanQuery();
    multiTermQuery.add(NumericRangeQuery.newIntRange("field_name1", 1, 1, true, true), BooleanClause.Occur.MUST);
    multiTermQuery.add(NumericRangeQuery.newIntRange("field_name2", 2, 2, true, true), BooleanClause.Occur.MUST);
    

    删除与查询匹配的文档

    然后我们最终将查询传递给 writer 以删除与查询匹配的文档:

    See the answer to this question.

    // Remove the document by using a multi key query:
    // http://www.avajava.com/tutorials/lessons/how-do-i-combine-queries-with-a-boolean-query.html
    writer.deleteDocuments(multiTermQuery);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-02-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多