【发布时间】:2016-03-24 21:18:49
【问题描述】:
我们正在尝试使用 Apache Phoenix 驱动程序针对 ~ 1150 万 条记录的数据集提高 HBase 设置的读取性能。
HBase 0.98
Apache Phoenix 驱动 4.3.1
松鼠客户端3.2
该表由21列组成,下面是DDL语句:
create table *table_name* (PKEY BIGINT not null primary key,DATE_KEY BIGINT,TIMEOFDAY_KEY BIGINT,GMT_TZ_PKEY BIGINT,FACT_DATE TIMESTAMP,PAGE_KEY BIGINT,FFER_KEY BIGINT,OFFER_TYPE_KEY BIGINT,SESSION_KEY BIGINT,CUSTOMER_KEY BIGINT,VISITS_CNTR BIGINT,ELIGIBLE_CNTR smallint, PRESENTED_CNTR smallint,ACCEPTED smallint, ACCEPTED_CLICK smallint,FIRST_RESPONSE_CNTR smallint,REJECTED_CNTR smallint,IS_FIXED smallint, IGNORED_CNTR smallint,ENGAGED_CNTR smallint,CONVERTED_CNTR smallint)
我们对表执行了加盐处理 (salt_buckets = 3) 并在所有列上创建了二级索引(不可变索引)。
我们正在执行以下查询,对应的时间在 Squirrel 客户端中提到:
Select count(*) from *table_name* :
Query Time (A) = 0.031 s
Transport time (B) = 2.631 s
Total Execution Time (A+B) = 2.661 s
执行计划:
PLAN
CLIENT 6-CHUNK PARALLEL 6-WAY
FULL SCAN OVER OFR_FCT_IDX_SALTED
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO SINGLE ROW
CLIENT 100 ROW LIMIT
select MAX(session_key) from *table_name* group by TIMEOFDAY_KEY having count(SESSION_KEY) > 100 order by TIMEOFDAY_KEY : Rows returned 431
Query Time (A) = 0.04 s
Transport time (B) = 11.894 s
Total Execution Time (A+B) = 11.934 s
执行计划:
PLAN
CLIENT 6-CHUNK PARALLEL 6-WAY FULL SCAN OVER OFR_FCT_IDX_SALTED
SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["TIMEOFDAY_KEY"]
CLIENT MERGE SORT
CLIENT FILTER BY COUNT(TO_BIGINT("SESSION_KEY")) > 100
CLIENT SORTED BY ["TIMEOFDAY_KEY"]
如您所见,查询时间很长,但传输时间(读取/输出时间)似乎相当长。
我的问题如下:
- 这些结果是否与我们预期的结果一致? 提到数据集?考虑到最新的性能测试结果: Latest Performance Test
- 我们能否以某种方式提高传输时间的性能(读取 时间)进一步?
【问题讨论】: