【问题标题】:Fetching unique data in Elasticsearch在 Elasticsearch 中获取唯一数据
【发布时间】:2017-07-18 09:50:58
【问题描述】:

我有以下数据

ID: 1, fldname: pawan
ID: 1, fldname: pawan1
ID: 1, fldname: pawan2
ID: 2, fldname: pawan3
ID: 3, fldname: pawan4
ID: 4, fldname: pawan5

我正在尝试根据 ID 字段获取唯一数据,类似于我们在 MySQL 中通过以下查询触发 group 时获得的数据:

select * from table_name where fldname like 'pawan%' group by ID

这将返回唯一值。当我们使用按功能分组时,在狮身人面像搜索中同样有效。

有没有办法在elasticsearch中获取唯一值..?

以下是我的示例映射:

"mappings": {
    "my_type": {
      "properties": {
        "docid": {
          "type": "keyword"
        },
        "flgname": {
          "type": "text"
        }
      }
    }
  }

【问题讨论】:

标签: elasticsearch


【解决方案1】:

我建议你稍微修改一下你的映射:

{
  "record" : {
    "dynamic" : "false",
    "_all" : {
      "enabled" : false
    },
    "properties" : {
      "docid" : {
        "type" : "long"
      },
      "flgname" : {
        "type" : "text"
      }
    }
  }
}

所以 docid 是 long

然后您可以尝试使用模糊查询进行过滤以及聚合,例如这里检索 docid 的最小值、最大值、平均值和计数:

{
  "from" : 0,
  "size" : 10,
  "_source" : true,
  "query" : {
    "bool" : {
      "must" : [ {
        "match" : {
          "flgname" : {
            "query" : "pawan",
            "operator" : "OR",
            "fuzziness" : "1",
            "prefix_length" : 1,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "boost" : 1.0
          }
        }
      } ]
    }
  },
  "aggs" : {
    "my_cardinality" : {
      "cardinality" : {
        "field" : "docid"
      }
    },
    "my_avg" : {
      "avg" : {
        "field" : "docid"
      }
    },
    "my_min" : {
      "min" : {
        "field" : "docid"
      }
    },
    "my_max" : {
      "max" : {
        "field" : "docid"
      }
    }
  }
}

顺便说一下,这是对您提出的数据进行上述查询的结果:

{
  "took" : 47,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : 0.9808292,
    "hits" : [ {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "40b5eac0-743b-4a6a-a06d-3ae4d56f4aca",
      "_score" : 0.9808292,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "27821c39-e722-4361-bc07-0dcd5181a1ad",
      "_score" : 0.7846634,
      "_source" : {
        "docid" : "2",
        "flgname" : "pawan3"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "86fcd9c1-a688-4a6a-9c45-e91791a8b902",
      "_score" : 0.7846634,
      "_source" : {
        "docid" : "4",
        "flgname" : "pawan5"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "fb00a3cc-f1b8-4073-8808-f2ddbc4979e2",
      "_score" : 0.55451775,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan1"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "18e5e20d-17a7-4d59-b2f1-7bf325a4c4df",
      "_score" : 0.55451775,
      "_source" : {
        "docid" : "3",
        "flgname" : "pawan4"
      }
    }, {
      "_index" : "stack_overflow1",
      "_type" : "record",
      "_id" : "fbf49af6-f574-4ad2-8686-cbbedc5e70c4",
      "_score" : 0.23014566,
      "_source" : {
        "docid" : "1",
        "flgname" : "pawan2"
      }
    } ]
  },
  "aggregations" : {
    "my_cardinality" : {
      "value" : 4
    },
    "my_max" : {
      "value" : 4.0
    },
    "my_avg" : {
      "value" : 2.0
    },
    "my_min" : {
      "value" : 1.0
    }
  }
}

【讨论】:

    【解决方案2】:

    如果您将 flgname 也设为关键字,则可以使用子聚合在 docID 上进行聚合,并在 flgname 上进行子聚合。结果将类似于您提到的 SQL 查询。

    查询看起来像:

    {   "size": 0,
    "query": {
        "regexp":{
            "flgname": "pawa.*"
        }
    },
    "aggs" : {
        "docids": {
           "terms": {"field": "docid"},
           "aggs": { "flgnam": { "terms": {"field": "flgname"}}}}
    }
    

    }

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-07-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多