【问题标题】:How to highlight only results of PrefixQuery in Lucene and not whole words?如何在 Lucene 中仅突出显示 PrefixQuery 的结果而不是整个单词?
【发布时间】:2022-10-18 14:01:04
【问题描述】:

我对 Lucene 很陌生,也许做错了什么,所以如果是这样,请纠正我。几天来一直在寻找答案,不知道从哪里开始。

目标是使用Lucene.NET 通过部分搜索来搜索用户名(如StartsWith)并仅突出显示找到的部分。例如,如果我在['a', 'ab', 'abc', 'abcd', 'abcde'] 列表中搜索abc,它应该只返回最后三个['<b>abc</b>', '<b>abc</b>d', '<b>abc</b>de']

这是我的处理方法。

首先创建索引:

using var indexDir = FSDirectory.Open(Path.Combine(IndexDirectory, IndexName));
using var standardAnalyzer = new StandardAnalyzer(CurrentVersion);

var indexConfig = new IndexWriterConfig(CurrentVersion, standardAnalyzer);
indexConfig.OpenMode = OpenMode.CREATE_OR_APPEND;

using var indexWriter = new IndexWriter(indexDir, indexConfig);
if (indexWriter.NumDocs == 0)
{
    //fill the index with Documents
}

文档是这样创建的:

static Document BuildClientDocument(int id, string surname, string name)
{
    var document = new Document()
    {
        new StringField("Id", id.ToString(), Field.Store.YES),

        new TextField("Surname", surname, Field.Store.YES),
        new TextField("Surname_sort", surname.ToLower(), Field.Store.NO),

        new TextField("Name", name, Field.Store.YES),
        new TextField("Name_sort", name.ToLower(), Field.Store.NO),
    };
    
    return document;
}

搜索是这样完成的:

using var multiReader = new MultiReader(indexWriter.GetReader(true)); //the plan was to use multiple indexes per entity types
var indexSearcher = new IndexSearcher(multiReader);

var queryString = "abc"; //just as a sample
var queryWords = queryString.SplitWords();

var query = new BooleanQuery();
queryWords
    .Process((word, index) =>
    {
        var boolean = new BooleanQuery()
        {
            { new PrefixQuery(new Term("Surname", word)) { Boost = 100 }, Occur.SHOULD }, //surnames are most important to match
            { new PrefixQuery(new Term("Name", word)) { Boost = 50 }, Occur.SHOULD }, //names are less important
        };
        boolean.Boost = (queryWords.Count() - index); //first words in a search query are more important than others
        
        query.Add(boolean, Occur.MUST);
    })
;

var topDocs = indexSearcher.Search(query, 50, new Sort( //sort by relevance and then in lexicographical order
    SortField.FIELD_SCORE,
    new SortField("Surname_sort", SortFieldType.STRING),
    new SortField("Name_sort", SortFieldType.STRING)
));

并强调:

var htmlFormatter = new SimpleHTMLFormatter();
var queryScorer = new QueryScorer(query);
var highlighter = new Highlighter(htmlFormatter, queryScorer);
foreach (var found in topDocs.ScoreDocs)
{
    var document = indexSearcher.Doc(found.Doc);
    var surname = document.Get("Surname"); //just for simplicity
    var surnameFragment = highlighter.GetBestFragment(standardAnalyzer, "Surname", surname);
    Console.WriteLine(surnameFragment);
}

问题是荧光笔返回的结果如下:

<b>abc</b>
<b>abcd</b>
<b>abcde</b>
<b>abcdef</b>

因此,即使我正在搜索部分单词,它也会“突出显示”整个单词。 Explain 一直返回 NON-MATCH 所以不确定它是否在这里有用。

是否可以仅突出显示搜索到的部分?就像我的例子一样。

【问题讨论】:

  • 如果我理解正确,您正在查看类似&lt;b&gt;abc&lt;/b&gt; &lt;b&gt;abc&lt;/b&gt;d &lt;b&gt;abc&lt;/b&gt;de &lt;b&gt;abc&lt;/b&gt;def 的内容。正确的?
  • 是的。正是这样。但我得到的只是&lt;b&gt;abc&lt;/b&gt; &lt;b&gt;abcd&lt;/b&gt; &lt;b&gt;abcde&lt;/b&gt; &lt;b&gt;abcdef&lt;/b&gt;

标签: c# lucene lucene.net


【解决方案1】:

Blockquote

兰贝420 No.referensi : DR2TRG

【讨论】:

    猜你喜欢
    • 2016-10-22
    • 2017-01-25
    • 2016-08-15
    • 2017-11-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多