假设你想查询某个表的某一列。Hive默认是会启用MapReduce Job来完毕这个任务,例如以下:
01
|
hive>
SELECT id, money FROM m limit 10;
|
02
|
Total
MapReduce jobs = 1
|
03
|
Launching
Job 1 out
of 1
|
04
|
Number
of reduce tasks is set to 0 since
there's no reduce operator
|
05
|
Cannot
run job locally: Input Size (= 235105473)
is larger than
|
06
|
hive.exec.mode.local.auto.inputbytes.max
(= 134217728)
|
07
|
Starting
Job = job_1384246387966_0229, Tracking URL =
|
11
|
Kill
Command = /home/q/hadoop-2.2.0/bin/hadoop
job
|
12
|
-kill
job_1384246387966_0229
|
13
|
hadoop
job information forStage-1:
number of mappers: 1;
|
15
|
2013-11-13 11:35:16,167 Stage-1 map
= 0%,
reduce = 0%
|
16
|
2013-11-13 11:35:21,327 Stage-1 map
= 100%,
reduce = 0%,
|
17
|
Cumulative
CPU 1.26 sec
|
18
|
2013-11-13 11:35:22,377 Stage-1 map
= 100%,
reduce = 0%,
|
19
|
Cumulative
CPU 1.26 sec
|
20
|
MapReduce
Total cumulative CPU time: 1 seconds 260 msec
|
21
|
Ended
Job = job_1384246387966_0229
|
22
|
MapReduce
Jobs Launched:
|
23
|
Job 0:
Map: 1 Cumulative
CPU: 1.26sec
|
24
|
HDFS
Read: 8388865 HDFS
Write: 60 SUCCESS
|
25
|
Total
MapReduce CPU Time Spent: 1 seconds 260 msec
|
37
|
Time
taken: 16.802 seconds,
Fetched: 10 row(s)
|
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题。从Hive0.10.0版本号開始,对于简单的不须要聚合的类似SELECT <col> from <table> LIMIT n语句,不须要起MapReduce job,直接通过Fetch task获取数据,能够通过以下几种方法实现:
方法一:
01
|
hive>
set hive.fetch.task.conversion=more;
|
02
|
hive>
SELECT id, money FROM m limit 10;
|
14
|
Time
taken: 0.138 seconds,
Fetched: 10 row(s)
|
上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
方法二:
1
|
bin/hive
--hiveconf hive.fetch.task.conversion=more
|
方法三:
上面的两种方法都能够开启了Fetch任务,可是都是暂时起作用的;假设你想一直启用这个功能。能够在${HIVE_HOME}/conf/hive-site.xml里面增加下面配置:
02
|
<name>hive.fetch.task.conversion</name>
|
05
|
Some
select queries can be converted to single FETCH task
|
06
|
minimizing
latency.Currently the query should be single
|
07
|
sourced
not having any subquery and should not have
|
08
|
any
aggregations or distincts (which incurrs RS),
|
09
|
lateral
views and joins.
|
10
|
1.
minimal : SELECT STAR, FILTER on partition columns, LIMIT only
|
11
|
2.
more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
|
这样就能够长期启用Fetch任务了,非常不错吧。也赶紧去试试吧!