Hive数据库 1.1.1的安装

Hive的安装-MySQL作为元数据库

  • 安装JDK-之前的文章介绍过

  • 安装Hadoop-之前的文章介绍过

  • 安装Mysql-之前的文章介绍过

1、建立Hive数据库,用户,赋予权限

  1. #mysql虚拟机的默认密码,在我做试验的时候是123456

  2. #mysql -u root -p

  3. mysql>grant all privileges on*.*to [email protected]"%"identifiedby"hive"withgrant option;

  4. mysql>flush privileges;

  5. Mysql在Ubuntu中默认安装后,只能在本机访问,如果要开启远程访问,需要做以下两个步骤:

  6. #nano /etc/mysql/my.cnf

  7. 找到bind-address=127.0.0.1,把这一行注释掉

2、安装Hive

  1. [email protected]:~$ sudo tar xvfz apache-hive-1.1.1-bin.tar.gz

  2. [email protected]:~$ sudo cp-R apache-hive-1.1.1-bin/usr/local/hive

  3. [email protected]:~$ sudo chmod-R775/usr/local/hive/

  4. [email protected]:~$ sudo chown hadoop:hadoop/usr/local/hive/

  5. #修改/etc/profile加入HIVE_HOME的变量

  6. exportHIVE_HOME=/usr/local/hive

  7. exportPATH=$PATH:$HIVE_HOME/bin

  8. exportCLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:/usr/local/hive/lib

  9.  
  10. #修改hive/conf下的几个template模板并重命名为其他

  11. cp hive-env.sh.templatehive-env.sh

  12. cp hive-default.xml.templatehive-site.xml

  13.  
  14. #配置hive-env.sh文件,指定HADOOP_HOME

  15. HADOOP_HOME=/usr/local/hadoop

  16. #修改hive-site.xml文件,指定MySQL数据库驱动、数据库名、用户名及密码,修改的内容如下所示

  17. <property>

  18. <name>javax.jdo.option.ConnectionURL</name>

  19. <value>jdbc:mysql://192.168.1.178:3306/hive?createDatabaseIfNotExist=true</value>

  20. <description>JDBC connectstringfora JDBC metastore</description>

  21. </property>

  22. <property>

  23. <name>javax.jdo.option.ConnectionDriverName</name>

  24. <value>com.mysql.jdbc.Driver</value>

  25. <description>Driverclassnamefora JDBC metastore</description>

  26. </property>

  27. <property>

  28. <name>javax.jdo.option.ConnectionUserName</name>

  29. <value>hive</value>

  30. <description>username touseagainst metastore database</description>

  31. </property>

  32. <property>

  33. <name>javax.jdo.option.ConnectionPassword</name>

  34. <value>hive</value>

  35. <description>password touseagainst metastore database</description>

  36. </property>

  37. 其中:

  38. javax.jdo.option.ConnectionURL参数指定的是Hive连接数据库的连接字符串;

  39. javax.jdo.option.ConnectionDriverName参数指定的是驱动的类入口名称;

  40. javax.jdo.option.ConnectionUserName参数指定了数据库的用户名;

  41. javax.jdo.option.ConnectionPassword参数指定了数据库的密码。

3、修改hive/bin下的hive-config.sh文件,设置JAVA_HOME,HADOOP_HOME

  1. exportJAVA_HOME=/usr/lib/jvm

  2. exportHADOOP_HOME=/usr/local/hadoop

  3. exportHIVE_HOME=/usr/local/hive

4、下载mysql-connector-java-5.1.27-bin.jar文件,并放到$HIVE_HOME/lib目录下

  1. 可以从Mysql的官方网站下载,但是记得一定要解压呀,下载的是一个tar.gz文件

5、在HDFS中创建/tmp和/user/hive/warehouse并设置权限

  1. hadoop fs-mkdir/tmp

  2. hadoop fs-mkdir/user/hive/warehouse

  3. hadoop fs-chmod g+w/tmp

  4. hadoop fs-chmod g+w/user/hive/warehouse

6、启动hadoop。进入hive shell,输入一些命令查看

  1. hive

  2. show databases;

  3. show tables;

7、可以在hadoop中查看hive生产的文件

  1. hadoop dfs-ls/user/hive/warehouse

Hive使用实例

在正式讲解HiveQL之前,先在命令行下运行几样命令是有好处的,可以感受一下HiveQL是如何工作的,也可以自已随便探索一下.

1、查询示例

  1. hive>SHOW TABLES;

  2. OK

  3. testuser

  4. Timetaken:0.707seconds,Fetched:1row(s)

  5.  
  6. hive>DESC testuser;

  7. OK

  8. idint

  9. usernamestring

  10. Timetaken:0.38seconds,Fetched:2row(s)

  11. hive>SELECT*fromtestuser limit10;

  12. OK

  13. 1sssss

  14. 1sssss

  15. Timetaken:0.865seconds,Fetched:2row(s)

  16. hive>

  17. hive>selectcount(1)fromtestuser;

  18. QueryID=hadoop_20160205004747_9d84aaca-887a-43a0-bad9-eddefe4e2219

  19. Totaljobs=1

  20. LaunchingJob1outof1

  21. Numberof reduce tasks determined at compile time:1

  22. Inorder to change the average loadfora reducer(inbytes):

  23. sethive.exec.reducers.bytes.per.reducer=<number>

  24. Inorder to limit the maximum number of reducers:

  25. sethive.exec.reducers.max=<number>

  26. Inorder toseta constant number of reducers:

  27. setmapreduce.job.reduces=<number>

  28. StartingJob=job_1454604205731_0001,TrackingURL=http://Master:8088/proxy/application_1454604205731_0001/

  29. KillCommand=/usr/local/hadoop/bin/hadoop job-kill job_1454604205731_0001

  30. Hadoopjob informationforStage-1:number of mappers:1;number of reducers:1

  31. 2016-02-0500:48:11,942Stage-1map=0%,reduce=0%

  32. 2016-02-0500:48:19,561Stage-1map=100%,reduce=0%,CumulativeCPU1.38sec

  33. 2016-02-0500:48:28,208Stage-1map=100%,reduce=100%,CumulativeCPU2.77sec

  34. MapReduceTotalcumulative CPU time:2seconds770msec

  35. EndedJob=job_1454604205731_0001

  36. MapReduceJobsLaunched:

  37. Stage-Stage-1:Map:1Reduce:1CumulativeCPU:2.77sec   HDFSRead:6532HDFSWrite:2SUCCESS

  38. TotalMapReduceCPUTimeSpent:2seconds770msec

  39. OK

  40. 2

  41. Timetaken:35.423seconds,Fetched:1row(s)

通过这些消息,可以知道该查询生成了一个Mapreduce作业,Hive之美在于用户根本不需要知道MapReduce的存在,用户所需关心的,仅仅是使用一种类似于SQL的语言.

多次重复实现大量数据插入hive> insert overwrite table testuser    > select id,count(id)    > from testuser    > group by id;

 

相关文章:

  • 2022-12-23
  • 2021-07-11
  • 2021-06-30
  • 2021-11-30
  • 2021-10-19
  • 2021-11-12
  • 2022-12-23
猜你喜欢
  • 2022-01-05
  • 2021-12-03
  • 2021-11-17
  • 2022-12-23
  • 2021-10-08
  • 2021-11-03
相关资源
相似解决方案