【发布时间】:2012-03-18 20:58:40
【问题描述】:
我在我的应用程序中使用了 solr,并且我集成了拼写检查组件,但我遇到了一些问题:
首先: 当我输入一个以空格分隔的术语时,他们会给我每个术语的更正
例如:"watters" => "what term" 但真实的是 watters
第二: 当我输入一些带有错误术语的短语时。尽管其他术语是正确的,但它们对所有术语都应用了该咒语。
Eg : "Difreences in lankuage 使用约定" => "语言差异使用conversions".
真实的是“语言使用约定的差异”
这是我在 solrconfig.xml 中的配置:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">spell</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Schema.xml:
字段类型:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
<analyzer type="multiterm" >
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
字段:
<field name="title" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>
复制字段
<copyField source="title" dest="spell"/>
感谢您的帮助
干杯
【问题讨论】:
-
很好的问题...您是否有任何教程可以用来了解有关 lucene 的更多信息...除了 Solr 页面的官方文档吗?泰
-
@Sebastian:基础教程:Solr in 5 minutes,Apache Lucene quick-start guide。 高级教程: Dzone Solr Tutorial。 示例: solr Drupal for Drupal、Apache Solr for WordPress、Solr Php Manual。祝你好运,我希望它有所帮助;)
标签: solr autocorrect spell-checking