【问题标题】:Lucene full-text search only works for labels that match search string exactlyLucene 全文搜索仅适用于与搜索字符串完全匹配的标签
【发布时间】:2018-09-17 12:59:29
【问题描述】:

我在使用 Apache Lucene 进行全文搜索时遇到了一些麻烦。当我输入整个标签时,我可以检索名称,例如“cat”,但输入“c”不会产生任何结果。我正在使用 RDF4J。 这是我使用的 SPARQL 查询:

SELECT DISTINCT ?e2 ?altLabel ?label ?description WHERE
    {
       {
          ?e2 search:matches ?match .
           ?match search:query ?string ;
                  search:property ?labelIri ;
                  search:snippet ?altLabel
        }
     ?e2 ?labelIri ?label.
     }

LuceneSailConnection 然后将其转换为:

Distinct
   Projection
      ProjectionElemList
         ProjectionElem "e2"
         ProjectionElem "label"
         ProjectionElem "description"
      Extension
         ExtensionElem (description)
            Var (name=description)
         Join
            Join
               Join
                  StatementPattern
                     Var (name=e2)
                     Var (name=_const_232d65d1_uri, value=http://www.openrdf.org/contrib/lucenesail#matches, anonymous)
                     Var (name=match)
                  StatementPattern
                     Var (name=match)
                     Var (name=_const_802884e6_uri, value=http://www.openrdf.org/contrib/lucenesail#query, anonymous)
                     Var (name=string)
               StatementPattern
                  Var (name=match)
                  Var (name=_const_f59a94f7_uri, value=http://www.openrdf.org/contrib/lucenesail#property, anonymous)
                  Var (name=labelIri)
            StatementPattern
               Var (name=e2)
               Var (name=labelIri)
               Var (name=label)

这是用于在知识库中索引概念及其标签的代码:

@Override
public void indexLocalKb(KnowledgeBase aKb) throws IOException
{
    Analyzer analyzer = new StandardAnalyzer();
    Directory directory = FSDirectory
        .open(new File(luceneIndexDir, aKb.getRepositoryId()).toPath());
    IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));

    try (RepositoryConnection conn = getConnection(aKb)) {
        RepositoryResult<Statement> stmts = RdfUtils
            .getStatementsSparql(conn, null, aKb.getLabelIri(), null,
                Integer.MAX_VALUE, false, null);
        while (stmts.hasNext()) {
            Statement stmt = stmts.next();
            String id = stmt.getSubject().stringValue();
            String label = stmt.getObject().stringValue();
            String predicate = stmt.getPredicate().stringValue();
            indexEntity(id, label, predicate, indexWriter);
        }
    }

    indexWriter.close();
}

private void indexEntity(String aId, String aLabel, String aPredictate,
    IndexWriter aIndexWriter)
{
    try {
        String FIELD_ID = "id";
        String FIELD_CONTENT = "label";
        Document doc = new Document();
        doc.add(new StringField(FIELD_ID, aId, Field.Store.YES));
        doc.add(new StringField(FIELD_CONTENT, aLabel, Field.Store.YES));
        aIndexWriter.addDocument(doc);
        aIndexWriter.commit();

        log.info("Entity indexed with id [{}] and label [{}], predicate [{}]",
            aId, aLabel, aPredictate);
    }
    catch (IOException e) {
        log.error("Could not index entity with id [{}] and label [{}]", aId, aLabel);
    }
}

【问题讨论】:

  • 你至少应该提到你正在使用哪个 API,因为 Lucene resp。全文搜索不是 SPARQL 标准的一部分。 (我猜是芝麻和 RDF4J)
  • 我想说如果要搜索以c开头的东西,那么根据Lucene查询语法,查询必须是c*。参照。 docs.rdf4j.org/programming 第 5.1.2 节。全文搜索
  • @rec 这听起来对我来说是正确的答案 - 想把它作为答案发布吗?
  • @rec 谢谢,成功了。

标签: java lucene sparql rdf4j


【解决方案1】:

您必须使用 Lucene 查询语法。搜索 c* 而不是 c。见http://www.lucenetutorial.com/lucene-query-syntax.html

【讨论】:

    猜你喜欢
    • 2022-08-11
    • 2021-05-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-09-22
    • 1970-01-01
    • 1970-01-01
    • 2014-07-14
    相关资源
    最近更新 更多