【问题标题】:Lucene what i can replace iterator in metod?Lucene我可以在metod中替换迭代器吗?
【发布时间】:2021-06-28 13:57:52
【问题描述】:

我有个主意:

  1. 在文本中寻找模式,
  2. 如果我找到了一个模式,那么我想从文本中获取它的位置。

现在我有 1 个。

第 2 部分已完成,但它使用迭代器,这意味着我们将在获得所需模板之前遍历所有术语,我怎样才能立即获得我的术语并定位文本?

我的代码:

public void methodFromStack() throws Exception {
        
    Directory directory = new RAMDirectory();
    IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
    IndexWriter writer = new IndexWriter(directory, indexWriterConfig);

    Document doc = new Document();
    // Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES
    FieldType type = new FieldType();
    type.setStoreTermVectors(true);
    type.setStoreTermVectorPositions(true);
    type.setStoreTermVectorOffsets(true);
    type.setStored(true);
    type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
    Field fieldStore = new Field("tags", "Kite good world.", type);
    doc.add(fieldStore);
    writer.addDocument(doc);
    writer.close();
    
    DirectoryReader reader = DirectoryReader.open(directory);
    IndexSearcher searcher = new IndexSearcher(reader);
    
    //Поиск по словосочетанию с учетом отступа
    QueryParser queryParser = new QueryParser("tags", new StandardAnalyzer());
    Query query = queryParser.parse("\"Kite World\"~1");
    TopDocs results = searcher.search(query, 1);
    
    for ( ScoreDoc scoreDoc : results.scoreDocs) {

        Fields termVs = reader.getTermVectors(scoreDoc.doc);
        Terms f = termVs.terms("tags");

        TermsEnum te = f.iterator();
        PostingsEnum docsAndPosEnum = null;
        BytesRef bytesRef;

        //Here iterator, output all terms, but i need get one my result term and possition
        while ((bytesRef = te.next()) != null) {
            docsAndPosEnum = te.postings(docsAndPosEnum, PostingsEnum.ALL);
            // for each term (iterator next) in this field (field)
            // iterate over the docs (should only be one)
            int nextDoc = docsAndPosEnum.nextDoc();
            assert nextDoc != DocIdSetIterator.NO_MORE_DOCS;
            final int fr = docsAndPosEnum.freq();
            final int p = docsAndPosEnum.nextPosition();
            final int o = docsAndPosEnum.startOffset();
            
            System.out.println("Word: " + bytesRef.utf8ToString());
            System.out.println("Position: "+ p + ", startOffset: " + o + " length: " 
 +bytesRef.length + " Freg: " + fr);
        
            if(fr > 1){
                for(int iter = 1; iter <= fr-1; iter++) {
                    System.out.println("Possition: "+ docsAndPosEnum.nextPosition());
                }
          
            }


        }
    }
}

(我知道在旧版本的库 Lucene 中有类 TermFreqVector 和类 TermPositionVector?,但是随着从 3 到 4 的过渡到新版本,发生了变化。在这些变化之后,我发现的只是采用迭代器。

使用:Windows+NetBeans+maven+Lucene 7.4.0)

【问题讨论】:

    标签: java lucene


    【解决方案1】:

    解决问题的方法:使用方法seekExact,你可以使用该代码进行测试:

            TermsEnum te = f.iterator();
            PostingsEnum docsAndPosEnum = null;
            if (te.seekExact(ref)) { 
                
                docsAndPosEnum = te.postings(docsAndPosEnum, PostingsEnum.ALL);
                int nextDoc = docsAndPosEnum.nextDoc();
                assert nextDoc != DocIdSetIterator.NO_MORE_DOCS;
                final int freg = docsAndPosEnum.freq();
                final int pos = docsAndPosEnum.nextPosition();
                final int o = docsAndPosEnum.startOffset();
    
                System.out.println("Word: " + ref.utf8ToString());
                System.out.println("Position: " + pos + ", startOffset: " + o + " length: " + ref.length + " Freg: " + freg);
    

    【讨论】:

      猜你喜欢
      • 2011-01-03
      • 2016-09-18
      • 2011-03-26
      • 1970-01-01
      • 2011-10-21
      • 2018-03-06
      • 2021-04-16
      • 2011-12-22
      相关资源
      最近更新 更多