【问题标题】:Solr Suggester component doesn't return hits for non-English wordsSolr Suggester 组件不返回非英语单词的命中
【发布时间】:2013-02-25 00:33:41
【问题描述】:

我已经定义了一个这样的建议组件:

<searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>

        <str name="field">autosuggest_general</str>
        <float name="threshold">0.005</float>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest</str>
        <str name="spellcheck.onlymorepopular">true</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

autosuggest_general这样的字段:

<field name="autosuggest_general" type="autosuggest_type" indexed="true" stored="true" multiValued="true" />
<fieldType name="autosuggest_type" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

建议器组件不会返回任何非英语单词的匹配项。
我想自动完成单词Marcos

所以当我打电话给http://localhost:8983/solr/mycore/suggest?q=mar 时,我会收到以下回复:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
    </lst>
    <lst name="spellcheck">
        <lst name="suggestions"/>
    </lst>
</response>

常规搜索返回 10 次点击:
http://localhost:8983/solr/mycore/select?q=autosuggest_general:marcos

对于de,我得到以下回复:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
    </lst>
    <lst name="spellcheck">
        <lst name="suggestions">
            <lst name="de">
                <int name="numFound">3</int>
                <int name="startOffset">0</int>
                <int name="endOffset">2</int>
                <arr name="suggestion">
                    <str>design</str>
                    <str>developer</str>
                    <str>development</str>
                </arr>
            </lst>
            <str name="collation">design</str>
        </lst>
    </lst>
</response>

designdeveloperdevelopment 都可以,但我在建议中没有看到 dejan,而且这个词确实存在于 autosuggest_general 字段中。

http://localhost:8983/solr/mycore/select?q=autosuggest_general:dejan 返回

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="q">autosuggest_general:dejan</str>
        </lst>
    </lst>
    <result name="response" numFound="7" start="0">
    ...
    </result>
</response>

我正在使用 Solr 4.1

任何帮助将不胜感激!

【问题讨论】:

    标签: solr


    【解决方案1】:

    这可能是一个问题:

    <float name="threshold">0.005</float>
    

    https://wiki.apache.org/solr/Suggester 说:

    threshold - threshold is a value in [0..1] representing the minimum fraction of documents (of the total) where a term should appear, in order to be added to the lookup dictionary.

    尝试降低它,看看是否匹配。

    【讨论】:

    • 如果 threshold 设置为 0 可以正常工作。谢谢!
    猜你喜欢
    • 2017-01-28
    • 1970-01-01
    • 2012-10-03
    • 2012-05-19
    • 2019-02-23
    • 2014-04-04
    • 1970-01-01
    • 2014-12-29
    • 2013-09-17
    相关资源
    最近更新 更多