【问题标题】:Designing a slowly changing dimension type 2 script with postgresql使用 postgresql 设计一个渐变维度类型 2 脚本
【发布时间】:2019-06-17 00:13:30
【问题描述】:

假设我有以下目标表:

CREATE TABLE DimCustomer (
CustomerKey serial PRIMARY KEY,
    CustomerNum int NOT NULL,
    CustomerName varchar(25) NOT NULL,
    Planet varchar(25) NOT NULL,
    RowIsCurrent char(1) NOT NULL DEFAULT 'Y',
    RowStartDate date NOT NULL DEFAULT CURRENT_TIMESTAMP,
    RowEndDate date NOT NULL DEFAULT '12/31/9999'
);

INSERT INTO DimCustomer
(CustomerNum, CustomerName, Planet,  RowStartDate) 
VALUES (101,'Anakin Skywalker', 'Tatooine',   CURRENT_TIMESTAMP - INTERVAL '101 days'),
       (102,'Yoda', 'Coruscant',  CURRENT_TIMESTAMP - INTERVAL '100 days'),
       (103,'Obi-Wan Kenobi', 'Coruscant',  CURRENT_TIMESTAMP - INTERVAL '100 days')

我有以下临时表:

CREATE TABLE Staging_DimCustomer
(
    CustomerNum int NOT NULL,
    CustomerName varchar(25) NOT NULL,
    Planet varchar(25) NOT NULL,
    ChangeDate date NOT NULL DEFAULT CURRENT_TIMESTAMP,
    RankNo int NOT NULL DEFAULT 1
)
INSERT INTO Staging_DimCustomer(CustomerNum, CustomerName, Planet, ChangeDate)
VALUES
(103,'Ben Kenobi', 'Coruscant',   CURRENT_TIMESTAMP - INTERVAL '99 days')

在临时表中,'Obi-Wan Kenobi'(customernum 103) 似乎将他的名字改为 'Ben Kenobi'。我想创建一个实现 scd 类型 2 并产生以下结果的脚本(慢慢改变维度类型 2):

以下是我的尝试:

INSERT INTO DimCustomer (
  CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
  ) 
 select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '12/31/9999'
 from Staging_DimCustomer 

 ON CONFLICT (CustomerNum) and RowIsCurrent = 'Y'
  DO UPDATE SET
    CustomerName = EXCLUDED.CustomerName,
    Planet = EXCLUDED.Planet,
    RowIsCurrent = 'N',
    RowEndDate = EXCLUDED.ChangeDate

我不知道如何查找已更改的值,更新现有行以使其失效,然后插入带有rowiscurrent = 'Y' 标志的新行。我正在尝试根据此 sql server 文章为我的解决方案建模 http://www.made2mentor.com/2013/08/how-to-load-slowly-changing-dimensions-using-t-sql-merge/.

【问题讨论】:

    标签: sql postgresql dimensional-modeling


    【解决方案1】:

    假设更改都在最新的行上,那么你可以更新当前行然后插入:

    with u as (
          update dimCustomer c
              set RowIsCurrent = 'N',
                  RowEndDate = sc.ChangeDate
          from Staging_DimCustomer sc
          where sc.CustomerNum = c.CustomerNum and
                c.RowIsCurrent = 'Y'
         )
    insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                             ) 
         select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
         from Staging_DimCustomer sc;
    

    这假设更改发生在最新记录上。实施历史性更改相当棘手,我猜这没有必要。

    请注意,您可能需要额外检查插入的行实际上与当前行不同。

    编辑:

    如果您想避免更改已存在的行,您可以这样做:

    with sc as (
          select *
          from Staging_DimCustomer
          where not exists (select 1
                            from DimCustomer c
                            where c.CustomerNum = sc.CustomerNum and
                                  c.CustomerName = sc.CustomerName and
                                  . . .  -- whatever other columns you want to check
                          )
         ),
         u as (
          update dimCustomer c
              set RowIsCurrent = 'N',
                  RowEndDate = sc.ChangeDate
          from sc
          where sc.CustomerNum = c.CustomerNum and
                c.RowIsCurrent = 'Y'
         )
    insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                             ) 
         select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
         from sc;
    

    【讨论】:

    • 感谢您的解决方案。我假设where not existsinsert into 之前?
    • @gibbz00 。 . .我意识到它比我指定的要复杂一些,所以我认为额外的 CTE 有助于在进行任何更改之前过滤掉不需要的更新。
    • 您提到了实施历史性更改比较棘手。那是不同的scd类型吗?我很想了解有关该实现类型的更多信息。
    • @gibbz00 。 . .它是相同的类型 2 维度,但更改的生效日期可能不是最新记录。我还应该补充一点,如果在暂存表中多次更新同一记录,则会出现其他复杂情况。
    • 令人着迷。我完全可以想象这样的用例。您能否指出任何有关 sql 场景的资源,其中更改的生效日期是针对较早的记录而不是当前行?
    【解决方案2】:

    我认为这应该可以正常工作,而不是更新或插入已经存在的记录:

    with us as (
      update dimCustomer c
          set RowIsCurrent = 'N',
              RowEndDate = sc.ChangeDate
      from Staging_DimCustomer sc
      where sc.CustomerNum = c.CustomerNum and
            c.RowIsCurrent = 'Y' and 
            sc.customername <> c.customername
     ),
     u as (
     select stg.customernum,stg.customername,stg.planet ,stg.changedate from Staging_DimCustomer  stg
     Inner join  DimCustomer dim on dim.customernum=stg.customernum and dim.rowiscurrent='Y'
     and (dim.customername <> stg.customername
          or dim.planet <> stg.planet
          )
     UNION
        select stg.customernum,stg.customername,stg.planet ,stg.changedate from Staging_DimCustomer  stg
     where  stg.customernum not IN(select dim.customernum  from DimCustomer dim where dim.rowiscurrent='Y')
     )
    insert into dimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate
                         ) 
    select CustomerNum, CustomerName, Planet, 'Y', ChangeDate, '9999-12-31'::date
     from  u ;
    

    【讨论】:

      猜你喜欢
      • 2014-08-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多