【问题标题】:Partial search return zero hits部分搜索返回零命中
【发布时间】:2018-07-18 10:31:36
【问题描述】:

我已经设法使用 elasticsearch (V6.1.3) 进行精确搜索。但是当我尝试部分或忽略大小写时(例如:{"query": {"match": {"demodata": "Hello"}}}{"query": {"match": {"demodata": "ell"}}}),获得零命中。不知道为什么?我根据以下提示设置了我的分析器: Partial search

from elasticsearch import Elasticsearch
es = Elasticsearch()
settings={
    "mappings": {
        "my-type": {
            'properties': {"demodata": {
                "type": "string",
                "search_analyzer": "search_ngram",
                "index_analyzer": "index_ngram"
            }
        }},

    },
    "settings": {
            "analysis": {
                    "filter": {
                            "ngram_filter": {
                                    "type": "ngram",
                                    "min_gram": 3,
                                    "max_gram": 8
                            }
                    },
                    "analyzer": {
                            "index_ngram": {
                                    "type": "custom",
                                    "tokenizer": "keyword",
                                    "filter": [ "ngram_filter", "lowercase" ]
                            },
                            "search_ngram": {
                                    "type": "custom",
                                    "tokenizer": "keyword",
                                    "filter": "lowercase"
                            }
                    }
            }
    }
}
es.indices.create(index="my-index", body=settings, ignore=400)
docs=[
    { "demodata": "hello" },
    { "demodata": "hi" },
    { "demodata": "bye" },
    { "demodata": "HelLo WoRld!" }
]
for doc in docs:
    res = es.index(index="my-index", doc_type="my-type", body=doc)

res = es.search(index="my-index", body={"query": {"match": {"demodata": "Hello"}}})
print("Got %d Hits:" % res["hits"]["total"])
print (res)

根据 Piotr Pradzynski 输入更新了代码,但它不起作用!!!

from elasticsearch import Elasticsearch
es = Elasticsearch()
if not es.indices.exists(index="my-index"):
    customset={
        "settings": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "my_tokenizer"
                    }
                },
                "tokenizer": {
                    "my_tokenizer": {
                        "type": "ngram",
                        "min_gram": 3,
                        "max_gram": 20,
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    }
                }
            }
        }
    }


    es.indices.create(index="my-index", body=customset, ignore=400)
    docs=[
        { "demodata": "hELLO" },
        { "demodata": "hi" },
        { "demodata": "bye" },
        { "demodata": "HeLlo WoRld!" },
        { "demodata": "xyz@abc.com" }
    ]
    for doc in docs:
        res = es.index(index="my-index", doc_type="my-type", body=doc)



es.indices.refresh(index="my-index")
res = es.search(index="my-index", body={"query": {"match": {"demodata":{"query":"ell","analyzer": "my_analyzer"}}}})

#print res
print("Got %d Hits:" % res["hits"]["total"])
print (res)

【问题讨论】:

    标签: python elasticsearch search n-gram elasticsearch-analyzers


    【解决方案1】:

    我认为您应该使用NGram Tokenizer 而不是NGram Token Filter 并添加multi field,它将使用此标记器。

    类似的东西:

    PUT my-index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "ngram_analyzer": {
              "tokenizer": "ngram_tokenizer",
              "filter": [
                "lowercase",
                "asciifolding"
              ]
            }
          },
          "tokenizer": {
            "ngram_tokenizer": {
              "type": "ngram",
              "min_gram": 3,
              "max_gram": 15,
              "token_chars": [
                "letter",
                "digit"
              ]
            }
          }
        }
      },
      "mappings": {
        "my-type": {
          "properties": {
            "demodata": {
              "type": "text",
              "fields": {
                "ngram": {
                  "type": "text",
                  "analyzer": "ngram_analyzer",
                  "search_analyzer": "standard"
                }
              }
            }
          }
        }
      }
    }
    

    然后你必须在搜索中使用添加的多字段demodata.ngram

    res = es.search(index="my-index", body={"query": {"match": {"demodata.ngram": "Hello"}}})
    

    【讨论】:

    • 感谢您的输入。我已经在我现有的代码上测试了您的代码,但仍然返回零命中。请查看我的更新代码。
    • @Paul85 您必须添加将使用此标记器并在其中进行搜索的多字段。看看我下面的答案 - 我更新了代码。
    【解决方案2】:

    您需要的是 query_string 搜索。

    https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

    {
      "query":{
        "query_string":{
          "query":"demodata: *ell*"
        }
      }
    }
    

    【讨论】:

    • 您好,感谢您的意见。一旦我测试它,我会告诉你。据我所知,如果您需要在大文件中进行搜索,通配符可能会更慢,所以您知道如何使用 ngram 进行搜索,我正在尝试这样做(请参阅我的代码)。
    • 呃,抱歉,我没有阅读那部分,我只是假设它是索引映射。是的,你是对的,通配符会降低性能,对不起,我没有使用 ngram 的经验。 ://
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-10-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多