Hive的安装-MySQL作为元数据库
-
安装JDK-之前的文章介绍过
-
安装Hadoop-之前的文章介绍过
-
安装Mysql-之前的文章介绍过
1、建立Hive数据库,用户,赋予权限
-
#mysql虚拟机的默认密码,在我做试验的时候是123456 -
#mysql -u root -p -
mysql>grant all privileges on*.*to [email protected]"%"identifiedby"hive"withgrant option; -
mysql>flush privileges; -
Mysql在Ubuntu中默认安装后,只能在本机访问,如果要开启远程访问,需要做以下两个步骤: -
#nano /etc/mysql/my.cnf -
找到bind-address=127.0.0.1,把这一行注释掉
2、安装Hive
-
[email protected]:~$ sudo tar xvfz apache-hive-1.1.1-bin.tar.gz -
[email protected]:~$ sudo cp-R apache-hive-1.1.1-bin/usr/local/hive -
[email protected]:~$ sudo chmod-R775/usr/local/hive/ -
[email protected]:~$ sudo chown hadoop:hadoop/usr/local/hive/ -
#修改/etc/profile加入HIVE_HOME的变量 -
exportHIVE_HOME=/usr/local/hive -
exportPATH=$PATH:$HIVE_HOME/bin -
exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/usr/local/hive/lib -
#修改hive/conf下的几个template模板并重命名为其他 -
cp hive-env.sh.templatehive-env.sh -
cp hive-default.xml.templatehive-site.xml -
#配置hive-env.sh文件,指定HADOOP_HOME -
HADOOP_HOME=/usr/local/hadoop -
#修改hive-site.xml文件,指定MySQL数据库驱动、数据库名、用户名及密码,修改的内容如下所示 -
<property> -
<name>javax.jdo.option.ConnectionURL</name> -
<value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value> -
<description>JDBC connectstringfora JDBC metastore</description> -
</property> -
<property> -
<name>javax.jdo.option.ConnectionDriverName</name> -
<value>com.mysql.jdbc.Driver</value> -
<description>Driverclassnamefora JDBC metastore</description> -
</property> -
<property> -
<name>javax.jdo.option.ConnectionUserName</name> -
<value>hive</value> -
<description>username touseagainst metastore database</description> -
</property> -
<property> -
<name>javax.jdo.option.ConnectionPassword</name> -
<value>hive</value> -
<description>password touseagainst metastore database</description> -
</property> -
其中: -
javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串; -
javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称; -
javax.jdo.option.ConnectionUserName参数指定了数据库的用户名; -
javax.jdo.option.ConnectionPassword参数指定了数据库的密码。
3、修改hive/bin下的hive-config.sh文件,设置JAVA_HOME,HADOOP_HOME
-
exportJAVA_HOME=/usr/lib/jvm -
exportHADOOP_HOME=/usr/local/hadoop -
exportHIVE_HOME=/usr/local/hive
4、下载mysql-connector-java-5.1.27-bin.jar文件,并放到$HIVE_HOME/lib目录下
-
可以从Mysql的官方网站下载,但是记得一定要解压呀,下载的是一个tar.gz文件
5、在HDFS中创建/tmp和/user/hive/warehouse并设置权限
-
hadoop fs-mkdir/tmp -
hadoop fs-mkdir/user/hive/warehouse -
hadoop fs-chmod g+w/tmp -
hadoop fs-chmod g+w/user/hive/warehouse
6、启动hadoop。进入hive shell,输入一些命令查看
-
hive -
show databases; -
show tables;
7、可以在hadoop中查看hive生产的文件
-
hadoop dfs-ls/user/hive/warehouse
Hive使用实例
在正式讲解HiveQL之前,先在命令行下运行几样命令是有好处的,可以感受一下HiveQL是如何工作的,也可以自已随便探索一下.
1、查询示例
-
hive>SHOW TABLES; -
OK -
testuser -
Timetaken:0.707seconds,Fetched:1row(s) -
hive>DESC testuser; -
OK -
idint -
usernamestring -
Timetaken:0.38seconds,Fetched:2row(s) -
hive>SELECT*fromtestuser limit10; -
OK -
1sssss -
1sssss -
Timetaken:0.865seconds,Fetched:2row(s) -
hive> -
hive>selectcount(1)fromtestuser; -
QueryID=hadoop_20160205004747_9d84aaca-887a-43a0-bad9-eddefe4e2219 -
Totaljobs=1 -
LaunchingJob1outof1 -
Numberof reduce tasks determined at compile time:1 -
Inorder to change the average loadfora reducer(inbytes): -
sethive.exec.reducers.bytes.per.reducer=<number> -
Inorder to limit the maximum number of reducers: -
sethive.exec.reducers.max=<number> -
Inorder toseta constant number of reducers: -
setmapreduce.job.reduces=<number> -
StartingJob=job_1454604205731_0001,TrackingURL=http://Master:8088/proxy/application_1454604205731_0001/ -
KillCommand=/usr/local/hadoop/bin/hadoop job-kill job_1454604205731_0001 -
Hadoopjob informationforStage-1:number of mappers:1;number of reducers:1 -
2016-02-0500:48:11,942Stage-1map=0%,reduce=0% -
2016-02-0500:48:19,561Stage-1map=100%,reduce=0%,CumulativeCPU1.38sec -
2016-02-0500:48:28,208Stage-1map=100%,reduce=100%,CumulativeCPU2.77sec -
MapReduceTotalcumulative CPU time:2seconds770msec -
EndedJob=job_1454604205731_0001 -
MapReduceJobsLaunched: -
Stage-Stage-1:Map:1Reduce:1CumulativeCPU:2.77sec HDFSRead:6532HDFSWrite:2SUCCESS -
TotalMapReduceCPUTimeSpent:2seconds770msec -
OK -
2 -
Timetaken:35.423seconds,Fetched:1row(s)
通过这些消息,可以知道该查询生成了一个Mapreduce作业,Hive之美在于用户根本不需要知道MapReduce的存在,用户所需关心的,仅仅是使用一种类似于SQL的语言.
多次重复实现大量数据插入hive> insert overwrite table testuser > select id,count(id) > from testuser > group by id;