【问题标题】:how to configure the synonyms_path in elasticsearch如何在 elasticsearch 中配置 synonyms_path
【发布时间】:2013-08-26 16:18:01
【问题描述】:

我对 elasticsearch 很陌生,我想使用同义词,我在配置文件中添加了这些行:

index :
    analysis :
        analyzer : 
            synonym :
                type : custom
                tokenizer : whitespace
                filter : [synonym]
        filter :
            synonym :
                type : synonym
                synonyms_path: synonyms.txt

然后我创建了一个索引测试:

"mappings" : {
  "test" : {
     "properties" : {
        "text_1" : {
           "type" : "string",
           "analyzer" : "synonym"
        },
        "text_2" : {
           "search_analyzer" : "standard",
           "index_analyzer" : "synonym",
           "type" : "string"
        },
        "text_3" : {
           "type" : "string",
           "analyzer" : "synonym"
        }
     }
  }

}

并使用此数据进行类型测试:

{
"text_3" : "foo dog cat",
"text_2" : "foo dog cat",
"text_1" : "foo dog cat"
}

synonyms.txt 包含“foo,bar,baz”,当我搜索 foo 时,它会返回我所期望的结果,但是当我搜索 baz 或 bar 时,它会返回零个结果:

{
"query":{
"query_string":{
    "query" : "bar",
    "fields" : [ "text_1"],
    "use_dis_max" : true,
    "boost" : 1.0
}}} 

结果:

{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}

【问题讨论】:

    标签: search elasticsearch search-engine


    【解决方案1】:

    我不知道,如果您的问题是因为您定义了“bar”的同义词错误。正如您所说,您很新,我将举一个与您类似的例子。我想展示 elasticsearch 如何在搜索时和索引时处理同义词。希望对您有所帮助。

    首先创建同义词文件:

    foo => foo bar, baz
    

    现在我使用您尝试测试的特定设置创建索引:

    curl -XPUT 'http://localhost:9200/test/' -d '{
      "settings": {
        "index": {
          "analysis": {
            "analyzer": {
              "synonym": {
                "tokenizer": "whitespace",
                "filter": ["synonym"]
              }
            },
            "filter" : {
              "synonym" : {
                  "type" : "synonym",
                  "synonyms_path" : "synonyms.txt"
              }
            }
          }
        }
      },
      "mappings": {
    
        "test" : {
          "properties" : {
            "text_1" : {
               "type" : "string",
               "analyzer" : "synonym"
            },
            "text_2" : {
               "search_analyzer" : "standard",
               "index_analyzer" : "standard",
               "type" : "string"
            },
            "text_3" : {
               "type" : "string",
               "search_analyzer" : "synonym",
               "index_analyzer" : "standard"
            }
          }
        }
      }
    }'
    

    请注意,synonyms.txt 必须与配置文件位于同一目录中,因为该路径是相对于配置目录的。

    现在索引一个文档:

    curl -XPUT 'http://localhost:9200/test/test/1' -d '{
      "text_3": "baz dog cat",
      "text_2": "foo dog cat",
      "text_1": "foo dog cat"
    }'
    

    现在搜索

    在字段 text_1 中搜索

    curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.15342641,
        "hits": [
          {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 0.15342641,
            "_source": {
              "text_3": "baz dog cat",
              "text_2": "foo dog cat",
              "text_1": "foo dog cat"
            }
          }
        ]
      }
    }
    

    您会得到该文档,因为 baz 是 foo 的同义词,并且在索引时 foo 会使用其同义词进行扩展

    在字段 text_2 中搜索

    curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
    

    结果:

    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
      }
    }
    

    我没有得到点击,因为我在索引时没有扩展同义词(标准分析器)。而且,由于我正在搜索 baz 并且 baz 不在文本中,所以我没有得到任何结果。

    在字段 text_3 中搜索

    curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.15342641,
        "hits": [
          {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 0.15342641,
            "_source": {
              "text_3": "baz dog cat",
              "text_2": "foo dog cat",
              "text_1": "foo dog cat"
            }
          }
        ]
      }
    }
    

    注意:text_3 是“baz dog cat”

    text_3 是没有扩展同义词的索引。当我在搜索 foo 时,它以“baz”作为同义词之一,我得到了结果。

    如果你想调试你可以使用_analyze端点例如:

    curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
    

    结果:

    {
      "tokens": [
        {
          "token": "foo",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 1
        },
        {
          "token": "baz",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 1
        },
        {
          "token": "bar",
          "start_offset": 0,
          "end_offset": 3,
          "type": "SYNONYM",
          "position": 2
        }
      ]
    }
    

    【讨论】:

    • 如何为同义词文件提供用户定义的路径
    猜你喜欢
    • 2020-08-30
    • 1970-01-01
    • 2020-03-21
    • 1970-01-01
    • 2019-01-18
    • 2015-05-11
    • 2021-02-01
    • 2017-07-18
    • 1970-01-01
    相关资源
    最近更新 更多