/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.4951328/etc/hadoop/conf.dist Hadoop文件目录
踩坑日记:
1.复制表
create table a_new like a;
2.分区数据传递:
https://blog.csdn.net/lvtula/article/details/92377923
开启动态分区
开启非严格模式
partition(dt) 而不是partition(dt=‘xxxx’) 但是如果分区太多会存在问题;
3.集群重新指定namenode之后,impala查询不到hive中新建的表/ 查询create table like table A;的表?
4.HDFS删除文件后磁盘空间未增大
清空hdfs 回收站
hdfs dfs -expunge
5.删除HDFS文件命令
hdfs dfs -rm -r /path
ods–>dwd层拉链表数据表设计与数据接入
Hive配置Lzo压缩格式:
配置lzo stored as后location问题未解决
目前测试数据压缩比为1:4左右
sqoop导入脚本
sqoop import
–connect jdbc:mysql://192.168.1.104:3306/company11 ?serverTimezone=GMT
–username root
–password [email protected]
–table COMPANY_MORTGAGE_CHANGE
–num-mappers 1
–hive-import
–fields-terminated-by “\t”
–hive-overwrite
–hive-table company_mortgage_change
–compress
–compression-codec com.hadoop.compression.lzo.LzopCodec
lzo压缩测试表通过(加分区)
drop table if exists aaa;
CREATE external TABLE aaa(
id int,
name string,
age int,
sex string
) COMMENT ‘测试表’
row format delimited fields terminated by ‘\t’
PARTITIONED BY (dt string)
STORED AS
INPUTFORMAT ‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
location ‘/data/warehouse/ods/aaa’;
hadoop lzo索引问题
find / -name “hadoop-lzo”
hadoop jar /data/yarn/nm/usercache/root/filecache/43/libjars/hadoop-lzo-0.4.15-cdh6.2.1.jar com.hadoop.compression.lzo.DistributedLzoIndexer /data/warehouse/test/company_mortgage_change
10-20问题:
sqoop export 表 leader_daping 去掉mysql主键之后数据可以同步成功???
同步导出代码:
sqoop export
–connect jdbc:mysql://10.21.32.31:3306/el_jishi?serverTimezone=GMT
–username qwt
–password [email protected]
–table leader_daping
–num-mappers 1
–export-dir /user/hive/warehouse/dm.db/leader_daping
–input-fields-terminated-by “\t”
map端join:
set hive.auto.convert.join = true;
sqoop导出mysql es_company踩坑:
除id之外的所有字段设置为非null,去掉主键冲突问题,数据可以导入;后期遇到字符集问题,?useUnicode=true&characterEncoding=utf-8
–connect ‘jdbc:mysql://10.21.32.31:3306/el_jishi?useUnicode=true&characterEncoding=utf-8’ \ 需要单加引号才能执行
build/mvn -Pyarn -Phadoop-3.0.0 -Dhadoop.version=3.0.0 -DskipTests clean package
sparkSession遗留连接问题:
1.可能是hive元数据进程问题?
2.依赖配置问题?
3.-- .config() 问题 / resource下配置文件?
ALTER TABLE tbl_name DEFAULT CHARACTER SET character_name [COLLATE…];
更改计算引擎:
set hive.execution.engine=spark;
set hive.execution.engine=mr;
问题截图:
https://blog.csdn.net/xueyao0201/article/details/79530130