【发布时间】:2015-03-02 14:04:54
【问题描述】:
我创建了一条河流,每小时运行一次以从数据库中获取数据(使用 jdbc 河流插件)。
select * from orders
我不想选择所有记录,而是选择基于主键附加的数据。查询将是:
select * from orders where deviceid > '(Max Id in Elastic search)'
如何从弹性搜索中获取最大_id?
【问题讨论】:
标签: elasticsearch
我创建了一条河流,每小时运行一次以从数据库中获取数据(使用 jdbc 河流插件)。
select * from orders
我不想选择所有记录,而是选择基于主键附加的数据。查询将是:
select * from orders where deviceid > '(Max Id in Elastic search)'
如何从弹性搜索中获取最大_id?
【问题讨论】:
标签: elasticsearch
似乎没有办法直接使用"_id" 字段,因为ES 坚持将"_id" 值转换为字符串。但是有一种方法可以解决它。
首先我用几个文档建立了一个简单的索引,如下所示:
PUT /test_index
{
"settings": {
"number_of_shards": 1
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"title":"first doc"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"title":"second doc"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"title":"third doc"}
然后我尝试使用max aggregation,但出现错误,因为"_id"s 是字符串:
POST /test_index/_search?search_type=count
{
"aggs": {
"max_id": {
"max": {
"field": "_id"
}
}
}
}
...
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[bQS7TqO9SfKSPQZYVXQBag][test_index][0]: ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]}]",
"status": 500
}
所以这行不通。但稍作修改,使用"_id" field 中的"path" 参数。
所以我将索引重新定义为
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"_id": {
"path": "doc_id"
}
}
}
}
然后使用"doc_id" 路径索引文档:
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"first doc","doc_id":1}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"second doc","doc_id":2}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"third doc","doc_id":3}
现在如果我搜索,我可以看到 "_id" 仍然是一个字符串,但 "doc_id" 是一个整数:
POST /test_index/_search
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"title": "first doc",
"doc_id": 1
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"title": "second doc",
"doc_id": 2
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"title": "third doc",
"doc_id": 3
}
}
]
}
}
所以现在我可以很容易地使用 max 聚合来找到最大的 id 值:
POST /test_index/_search?search_type=count
{
"aggs": {
"max_id": {
"max": {
"field": "doc_id"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"max_id": {
"value": 3
}
}
}
【讨论】: