【问题标题】:solr not searching in multi languagesolr 不以多语言搜索
【发布时间】:2017-01-20 06:52:49
【问题描述】:

这是我的 schema.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- multi language in single core R&D Pallav Jha  -->
<schema name="Pallav" version="1.14">
  <uniqueKey>SolrId</uniqueKey>
  <defaultSearchField>Name</defaultSearchField>
  <solrQueryParser defaultOperator="OR"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="date" class="solr.TrieDateField" positionIncrementGap="0" precisionStep="6"/>
  <fieldType name="float" class="solr.TrieFloatField" positionIncrementGap="0" precisionStep="0"/>
  <fieldType name="int" class="solr.TrieIntField" omitNorms="true" positionIncrementGap="0" precisionStep="0"/>
  <fieldType name="long" class="solr.TrieLongField" positionIncrementGap="0" precisionStep="0"/>
  <fieldType name="nGramAttributes" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="!!.*?!!" replacement=""/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.EdgeNGramFilterFactory" maxGramSize="10" minGramSize="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.FrenchLightStemFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory" language="French" />
      <filter class="solr.ASCIIFoldingFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory" language="German2"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.FrenchLightStemFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory" language="French" />
      <filter class="solr.ASCIIFoldingFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="nGramtext" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="3"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
      <filter class="solr.FrenchLightStemFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory" language="French" />
      <filter class="solr.ASCIIFoldingFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
      <filter class="solr.FrenchLightStemFilterFactory"/>
      <filter class="solr.SnowballPorterFilterFactory" language="French" />
      <filter class="solr.ASCIIFoldingFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="string" class="solr.StrField" omitNorms="true" sortMissingLast="true"/>
  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
    </analyzer>
    <analyzer type="query">
      <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.PhoneticFilterFactory" encoder="Soundex" inject="true"/>
    </analyzer>
  </fieldType>
  <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <field name="SolrId" type="string" indexed="true" required="true" stored="true"/>
  <field name="Name" type="string" indexed="true" required="true" stored="true"/> 

  <field name="en_Name" type="string" indexed="true" required="false" stored="true"/>
  <field name="nl_Name" type="string" indexed="true" required="false" stored="true"/>
  <field name="fr_Name" type="string" indexed="true" required="false" stored="true"/>
  <field name="hi_Name" type="string" indexed="true" required="false" stored="true"/> 

  <field name="_version_" type="long" indexed="true" stored="true"/>
  <field name="nGramContent" type="nGramtext" multiValued="true" indexed="true" required="false" stored="false"/>
  <dynamicField name="CDO_*" type="int" indexed="true" required="false" stored="true"/>
  <dynamicField name="MDO_*" type="int" indexed="true" required="false" stored="true"/>
  <dynamicField name="pa_*" type="string" multiValued="true" indexed="true" required="false" stored="true"/>
  <dynamicField name="cp_*" type="string" indexed="true" required="false" stored="true"/>
  <dynamicField name="f_*" type="string" multiValued="true" indexed="true" required="false" stored="true"/>
  <dynamicField name="*_s" type="string" indexed="true" stored="true"/>
  <!-- <copyField source="Name" dest="SpellContent"/> -->

</schema>

我正在尝试为法语实现多语言搜索,仅用于测试。 但它不起作用我没有得到任何结果。任何人都可以帮助我。我做错了什么

这是我的法语结果。solr french search result

【问题讨论】:

  • 好的,在一个英文例子中,你正在做的只是全部返回,你能在法语上尝试同样的方法吗?
  • 这是我在法语中搜索的查询 localhost:8983/solr/MultiLang/select?fq=fr_Name:A*&indent=on‌​&q=*:*&wt=json 它没有给我任何结果
  • 我假设您有一些名称从 A 开始的数据,对吧?:)
  • 实际上我想要这条我用法语存储的记录我的查询是localhost:8983/solr/MultiLang/…*:*&wt=json
  • 请用预期的查询更新问题

标签: apache solr full-text-search


【解决方案1】:

问题是,fr_Name 字段的类型是string,这意味着它没有被分析或标记,如果你想搜索包含空格的东西,比如Apple Mac Book Pro,你需要使用双引号, 完全匹配。所以,查询 "fq":"fr_Name":\"Apple Mac Book Pro\" 应该适合你。

来自 Solr wiki 的一些参考资料:

字符串(UTF-8 编码字符串或 Unicode)。字符串用于 小字段,并且不会以任何方式进行标记或分析。他们有个 硬限制略小于 32K。

【讨论】:

  • 通过使用这个我得到了我想要的所有结果 fr_Name:Apple Mac Book 结果
  • 现在,这取决于您的数据,您收到我写给您的内容了吗?字符串字段未分析,未标记化,要获得匹配,您需要将其完全写在双引号中
  • 在 Solr Admin 中,您可能不需要转义双引号
  • 实际上我正在实现法语和荷兰语的多语言搜索,所以它没有给我预期的结果
  • 您的问题不正确,因为您从未提及预期的结果
【解决方案2】:

添加此字段类型

<fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ElisionFilterFactory" articles="lang/contractions_fr.txt" ignoreCase="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_fr.txt" ignoreCase="true"/>
        <filter class="solr.FrenchLightStemFilterFactory"/>
      </analyzer>
    </fieldType>

它对我有用

【讨论】:

    猜你喜欢
    • 2016-11-17
    • 2011-12-17
    • 1970-01-01
    • 1970-01-01
    • 2020-10-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-11-02
    相关资源
    最近更新 更多