HIVE总结1（基础命令+数据类型）

Hive是一种数据仓库，保存了一些半结构化的数据（文本数据）。元数据数据-保存到mysql/debye（默认）数据库中。不支持update和delete语句。

数据必须要保存到hdfs上，依赖于hadoop。

Hive一组类似于SQL的语句-MYSQL，当执行SQL语句时，对应的生成MapReduce程序。

用于解决：Java程序员开发MapReduce的难点，会SQL就可以开发MapReduce程序。

功能：

管理表 - 被管理的表 - 放到/user/hive/warehouse/ *.table

外部表 -

View - 视图

JOIN

索引

函数 abs,ceil,explode,avg,count....

UDF User Definded Function - 输入一个值，输出一个值：length(‘Jack’)=4

UDAF - Aggregation(聚合） sum,count, - > 输入多个数据，返回一个数据 - 》count(*) = 10

UDTF Table-generation 表生成函数 - 》输入一行返回多行：explode(‘Jack’,’mary’) :

1.数据类型：

类型（建议都大写)	长度	Create table sometable( columnName xxxx, .. );
tinyint	1byte 有符号整数
smallint	2byte
int	4byte
bigint	8byte
boolean	True\|false
float	浮点
double	双精度浮点
string	字符串
timestamp	整数类型，或字符串
binary	字节数组类型

2.集合类型

类型
struct	对象或是结构体 hive> select struct('Jack','Mary'); OK _c0 {"col1":"Jack","col2":"Mary"}
map	map(key0, value0, key1, value1...) hive> select map('name','Jack','age',34); OK _c0 {"name":"Jack","age":"34"}
array	hive> select array(1,2,3); OK _c0 [1,2,3]

3.Hive命令

创建数据库：create database dbname；

数据库默认保存到hdfs的 /user/hive/warehouse目录下

可指定 create database dbname location /db;

查看数据库：hive>show databases;

退出hive：hive>quit;

使用数据库：hive>use table;

创建表：hive>create table stu(id,string,name string ,age int);

删除表

导入本地数据：Load data [local]本地的数据 ,没有写local就是hdfs上的数据

hive>create table name(text,string);

hive>load data local inpath '/home/keys/a.txt' into table name; //把a.tx的数据放到表name中

查询表中的所有数据 $:select * from name;

explode函数，就是输入一个数组，返回多行结果：hive>select explode(array('Jack','Mary'));

split：

用split分开将'jack mary'以空格分开：hive> select explode(split('Jack Mary',' '));

把name表中的数据以空格分开：hive>select explode(split(text,' ')) from name;

将数据从table1导入表table2.：：会启动mapreduce程序，所有的数据，按字符为单位，保存到table2里

hive>insert overwrite table table2 select explode(split(text," ")) frome table1;

分组计数：hive>select str ,count(1) from table group by str;

将数据保存到本地：

hive>insert overwrite local directory '${env:home}/out001' select text,count(1) from table1 group by text;

=>hive>insert overwrite local directory '/home/keys/out001' select text,count(1) from table1 group by text;

默认分隔符

$:cat -A out001/00000_0

HIVE总结1（基础命令+数据类型）

使用hdfs命令：查看hdfs上的数据,hdfs命令去掉hdfs

hive> dfs -ls /;

查看所有的配置：hive>set;

使用shell命令：加！

hive>！ll /home/keys;

执行sql命令：

1：在hive的命令行模式下执行：hive> select ...

2：在未登录的情况下使用-e参数：$ hive -e "select * from wc2"

可以通过-e来查询，如下

HIVE总结1（基础命令+数据类型）

设置表头：

hive>set hive.cli.print.header=true;

hive>select * from wc2;

HIVE总结1（基础命令+数据类型）

-f参数指定一个文件执行sql语句

在未登录的情况下执行a.sql 文件里的命令 $ hive -f a.sql

创建如图的表结构

HIVE总结1（基础命令+数据类型）

1.hive> create table person(

> id string,

> name string,

> age int

> )

> row format delimited //一行分割一次

> fields terminated by '\t' //字段一制表符分开

> stored as textfile; //存储为文本文件 hive智能读取文本文件

2.查看表的结构

HIVE总结1（基础命令+数据类型）

3.查看更加明细的表结构

HIVE总结1（基础命令+数据类型）

IgonreKeyTextOutputFormat 忽略偏移量

4.导入数据

见上面

5.:创建新的字段

hive>alter table person add columns (sex,string tel string);

HIVE总结1（基础命令+数据类型）

6.删除字段

删除sex,string

hive>alter table person replace columns(id string, name string,age int);

HIVE总结1（基础命令+数据类型）

删除表

只删除数据，不删除表 hive>truncat table person;

删除表同时删除数据 hive>drop table person;

如果是导入local的数据则是将本地的数据copy到hdfs上：

如果是导入hdfs的数据，则是移动数据到 warehouse:

创建外部表

hive> create external table person2(

>id string,

>name string,

>age int)

>row format delimited

> fields terminated by '\t'

> location '/db01/pp';

Drop table时外部表，只删除表的结构，不删除数据。

不建议使用insert但是能用

HIVE总结1（基础命令+数据类型）

4.读模式和写模式

传统的数据库，都是写模式：

在写入数据时，对数据进行检查的数据库就是写模式。

Person(id int,name varchar(30);

Insert into person values(‘T001’,’Jack’); 异常出错。 - T001不是int。

Hive读模式的数据库：

在写入的时候不检查可以正常写入，在查询的时候在检查，如果类型不匹配就输出null。

导出数据

Hive > insert overwrite [local ] directory ‘/path’ select * ...

5.集合类型说明

array，map struc t- 都叫集合分隔符号：collection items ..

Map : map keys ...

Map,array = 不用指定大小

struct 指定大小个数：

1.array

HIVE总结1（基础命令+数据类型）

hive执行脚本

hive>source ${env:HOME}/array.sql;

2.map

有一个数据结构，导入到数据库去以后，显示这个map结构：

HIVE总结1（基础命令+数据类型）

hive> create table t_map(id string

>name string

>contact map<String,String>)

>row foemat delimited

>fields terminated by '\t'

>collection items terminated by ','

>map keys terminated by ':';

3.struct

T0002 jack [email protected]:1898782173

hive> create table t_struct(id string

>name string

>contact struct<mail:String,tel:String>)

>row foemat delimited

>fields terminated by '\t'

>collection items terminated by ':';