【发布时间】:2020-06-09 17:58:44
【问题描述】:
我有一个关于 HBase 查询的问题。我看到很多数据被扫描以进行小型空间查询。我在 OSMNodes 表上触发了地理空间查询。下面是查询和表的详细信息。我看到 HBase 上的读取请求总数 (5,553,421,708) 并看到大多数区域和区域服务器上的请求。我们为什么要为这个查询扫描每个区域(整个表)?
**Query**:
"DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
**Table Schema:**
geomesa-hbase describe-schema -c atlas -f OSMNodes
INFO Describing attributes of feature 'OSMNodes'
geometry | Point (Spatio-temporally indexed)
ingestionTimestamp | Timestamp (Spatio-temporally indexed)
nextTimestamp | Timestamp
serializerVersion | String
featurePayload | String
User data:
geomesa.index.dtg | ingestionTimestamp
geomesa.indices | z3:6:3:geometry:ingestionTimestamp,id:4:3:
geomesa.stats.enable | true
geomesa.z.splits | 60
**Query Plan (through GeoMesa Cli):**
geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
Planning 'OSMNodes' (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
Original filter: (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
Sort: none
Transforms: none
Strategy selection:
Query processing took 17ms for 1 options
Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]]
Strategy selection took 1ms for 1 options
Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]
Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
Intervals: FilterValues(List((-∞,2020-05-27T16:59:31Z]),true,false)
Plan: ScanPlan
Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
Ranges (7440): [%00;%0a;E$A%08;%00;%00;%00;%00;%00;::%00;%0a;E$A%0c;], [%01;%0a;E$A%08;%00;%00;%00;%00;%00;::%01;%0a;E$A%0c;], [%02;%0a;E$A%08;%00;%00;%00;%00;%00;::%02;%0a;E$A%0c;], [%03;%0a;E$A%08;%00;%00;%00;%00;%00;::%03;%0a;E$A%0c;], [%04;%0a;E$A%08;%00;%00;%00;%00;%00;::%04;%0a;E$A%0c;]
Scans (120): ['%0a;ElA%98;%00;%00;%00;%00;%00;::'%0a;Ema%8c;], [:%0a;ElA%98;%00;%00;%00;%00;%00;:::%0a;Ema%8c;], [%14;::%14;%0a;ElA%8c;], [(%0a;ElA%98;%00;%00;%00;%00;%00;::(%0a;Ema%8c;], [%12;%0a;ElA%98;%00;%00;%00;%00;%00;::%12;%0a;Ema%8c;]
Column families: d
Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2629:2629),(zt,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
Plan creation took 135ms
Query planning took 433ms
其他查询 [1]
在进行实验时,我添加了较低的时间戳,它将延迟从 2-3 小时减少到了 10-20 分钟。
geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND ingestionTimestamp >= '2019-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
Planning 'OSMNodes' ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND ingestionTimestamp >= 2019-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
Original filter: ((DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND ingestionTimestamp >= '2019-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
Sort: none
Transforms: none
Strategy selection:
Query processing took 24ms for 1 options
Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]]
Strategy selection took 2ms for 1 options
Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)][nextTimestamp > 2020-05-27T16:59:31+00:00]
Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
Intervals: FilterValues(List([2019-05-27T16:59:31Z,2020-05-27T16:59:31Z]),true,false)
Plan: ScanPlan
Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
Ranges (404100): [%00;%0a;4$A%08;%00;%00;%00;%00;%00;::%00;%0a;4$A%0c;], [%01;%0a;4$A%08;%00;%00;%00;%00;%00;::%01;%0a;4$A%0c;], [%02;%0a;4$A%08;%00;%00;%00;%00;%00;::%02;%0a;4$A%0c;], [%03;%0a;4$A%08;%00;%00;%00;%00;%00;::%03;%0a;4$A%0c;], [%04;%0a;4$A%08;%00;%00;%00;%00;%00;::%04;%0a;4$A%0c;]
Scans (4080): [2%0a;/la%08;%00;%00;%00;%00;%00;::2%0a;0da%9c;], [%18;%0a;$mA%08;%00;%00;%00;%00;%00;::%18;%0a;%eA%9c;], [%03;%0a;@%e%08;%00;%00;%00;%00;%00;::%03;%0a;@me%9c;], [%0f;%0a;%eE%08;%00;%00;%00;%00;%00;::%0f;%0a;&-E%9c;], ['%0a;Bda%08;%00;%00;%00;%00;%00;::'%0a;C,a%9c;]
Column families: d
Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2577:2629),(zt,1410483:2097151,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND (ingestionTimestamp >= 2019-05-27T16:59:31+00:00 AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00)) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
Plan creation took 475ms
Query planning took 813ms
造成这么大差异的原因是什么?
【问题讨论】: