拉链算法是数仓常用的数据存储技术
建SDBDDL、SDB、ODB库,分别用于存储表结构、存储增量数据、存储历史数据:
CREATE DATABASE "SDBDDL" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL CREATE DATABASE "SDB" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL CREATE DATABASE "ODB" AS PERM=104857600 FALLBACK NO BEFORE JOURNAL NO AFTER JOURNAL
SDBDDL库建表:
CREATE MULTISET TABLE SDBDDL.SW5_SALARY_INFO( ID VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE \'账户ID\', NAME VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE \'户名\', BALANCE DECIMAL(12,2) TITLE \'余额\') PRIMARY INDEX ( ID );
注意:
1.定义为SET表的表不存储重复记录,而MULTISET表可存储重复记录
2.CASESPECIFIC ——大小写敏感,NOT CASESPECIFIC ——大小写不敏感
ODB库建表:(状态类无删除表)
CREATE MULTISET TABLE ODB.SW5_SALARY_INFO( ID VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE \'账户ID\' NOT NULL, NAME VARCHAR(10) CHARACTER SET LATIN CASESPECIFIC TITLE \'户名\' NOT NULL, BALANCE DECIMAL(12,2) TITLE \'余额\' NOT NULL, START_DT DATE FORMAT \'YYYYMMDD\' TITLE \'开始时间\' NOT NULL, END_DT DATE FORMAT \'YYYYMMDD\' TITLE \'结束时间\' NOT NULL ) PRIMARY INDEX ( ID );
给ODB铺个底:
INSERT INTO ODB.SW5_SALARY_INFO VALUES(\'001\',\'张三\',\'520\',DATE\'2020-01-02\',DATE\'2999-12-31\'); INSERT INTO ODB.SW5_SALARY_INFO VALUES(\'002\',\'李四\',\'2100\',DATE\'2020-10-10\',DATE\'2999-12-31\');
ODB有两条铺底数据了
20201016的下发增量数据:
张三的余额发生变化,变成1314
新增了王五的数据
入库 ——SDB.SW5_SALARY_INFO1016
INSERT INTO SDB.SW5_SALARY_INFO1016 VALUES(\'001\',\'张三\',\'1314\'); INSERT INTO SDB.SW5_SALARY_INFO1016 VALUES(\'003\',\'王五\',\'5000\');
将20201016的数据入导ODB:
建一张临时表VT_NEW(ODB表结构):
CREATE MULTISET VOLATILE TABLE VT_NEW, NO LOG AS ODB.SW5_SALARY_INFO WITH NO DATA ON COMMIT PRESERVE ROWS;
取出新增和修改的数据,导入临时表:
INSERT INTO VT_NEW( ID ,NAME ,BALANCE ,START_DT ,END_DT) SELECT N.ID ,N.NAME ,N.BALANCE ,DATE\'2020-10-16\' ,DATE\'2999-12-31\' FROM (SELECT COALESCE( TRIM( ID ) , \'\' ) AS ID ,COALESCE( TRIM( NAME ) , \'\' ) AS NAME ,COALESCE( TRIM( BALANCE ) , 0 ) AS BALANCE FROM SDB.SW5_SALARY_INFO1016 ) N LEFT JOIN (SELECT ID ,NAME ,BALANCE FROM ODB.SW5_SALARY_INFO WHERE END_DT = DATE\'2999-12-31\' ) T ON N.ID = T.ID WHERE T.ID IS NULL OR N.NAME<>T.NAME OR N.BALANCE<>T.BALANCE ;
查临时表:
SELECT * FROM VT_NEW
张三是修改的数据
王五是新增的数据
修改目标表
UPDATE T FROM ODB.SW5_SALARY_INFO T, VT_NEW N SET END_DT = DATE\'2020-10-16\' WHERE T.END_DT = DATE\'2999-12-31\' AND N.ID = T.ID ;

INSERT INTO ODB.SW5_SALARY_INFO SELECT * FROM VT_NEW;
到这一步,就完成了20201016的增量数据入库到近源模型层ODB