ElasticSearch-分词器analyzer

analyzer

分词器使用的两个情形：
1，Index time analysis. 创建或者更新文档时，会对文档进行分词
2，Search time analysis. 查询时，对查询语句分词

指定查询时使用哪个分词器的方式有：

　　- 查询时通过analyzer指定分词器

GET test_index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "lin",
        "analyzer": "standard"
      }
    }
  }
}

- 创建index mapping时指定search_analyzer

PUT test_index
{
  "mappings": {
    "doc": {
      "properties": {
        "title":{
          "type": "text",
          "analyzer": "whitespace",
          "search_analyzer": "standard"
        }
      }
    }
  }
}

索引时分词是通过配置 Index mapping中的每个字段的参数analyzer指定的

# 不指定分词时，会使用默认的standard
PUT test_index
{
  "mappings": {
    "doc": {
      "properties": {
        "title":{
          "type": "text",
          "analyzer": "whitespace"     #指定分词器，es内置有多种analyzer
        }
      }
    }}}

注意：

明确字段是否需要分词，不需要分词的字段将type设置为keyword，可以节省空间和提高写性能。

_analyzer api

GET _analyze
{
  "analyzer": "standard",
  "text": "this is a test"
}
# 可以查看text的内容使用standard分词后的结果

{
  "tokens": [
    {
      "token": "this",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 5,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "a",
      "start_offset": 8,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "test",
      "start_offset": 10,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

View Code