【问题标题】:Dynamic SQL table validation for data quality dimension数据质量维度的动态 SQL 表验证
【发布时间】:2021-12-21 14:09:54
【问题描述】:

我有以下代码使用动态 sql 在整个表中测试 nulls

/*Completitud*/
--Housekeeping:
drop table if exists tmp_completitud;
--Declarar variables para el loop:
declare @custom_sql   VARCHAR(max)
declare @tablename as VARCHAR(255) = 'maestrodatoscriticos' --Nombre de tabla a usar.
--Reemplazar '_[dimension]' como "apellido" de la tabla por cada nueva dimension:
set @custom_sql = 'select ''' + @tablename + '_Completitud' + ''' as tabla'
select @custom_sql =
           --Reemplazar query de dimension aqui:
       @custom_sql + ', ' + 'sum(cast(iif(' + c.name + ' is null,0,1) as decimal)) / count(*) as ' + c.name
from sys.columns c
         inner join sys.tables t on c.object_id = t.object_id
where t.name = @tablename
set @custom_sql = @custom_sql + ' into tmp_completitud from ' + @tablename
--print @custom_sql
exec (@custom_sql);
--Poblar tabla de dimensiones con dimension actual:
insert into dimensiones
select *
from tmp_completitud;

我现在想测试唯一值,但我很难在子查询中使用聚合函数。到目前为止,我有:

select sum(cast(iif(
            ( select sum(cnt) from ( select count(distinct identificacion) as cnt from maestrodatoscriticos ) as x ) =
            ( select sum(cnt2) from ( select count(identificacion) as cnt2 from maestrodatoscriticos ) as y ), 0,
            1) as decimal)) / count(*)
from maestrodatoscriticos;

我想以某种方式将select sum(cast(iif... 集成到上面的select @custom_sql = ... 中。有什么想法吗?

【问题讨论】:

    标签: tsql dynamic-sql data-quality


    【解决方案1】:

    实际上,我在同事的帮助下解决了这个问题。代码是:

    /*Unicidad*/
    --Housekeeping:
    drop table if exists tmp_unicidad;
    --Declarar variables para el loop:
    declare @sqluni VARCHAR(max) declare @tableuni as VARCHAR(255) = 'maestrodatoscriticos' --Nombre de tabla a usar.
    --Reemplazar '_[dimension]' como "apellido" de la tabla por cada nueva dimension:
    set @sqluni = 'select ''' + @tableuni + '_Unicidad' + ''' as tabla'
    select @sqluni =
               --Reemplazar query de dimension aqui:
           @sqluni + ', ' + 'count(distinct ' + c.name + ') * 1.00 / count(*) * 1.00 as ' + c.name
    from sys.columns c
             inner join sys.tables t on c.object_id = t.object_id
    where t.name = @tableuni
    set @sqluni = @sqluni + ' into tmp_unicidad from ' + @tableuni
    --print @custom_sql
    exec (@sqluni);
    --Poblar tabla de dimensiones con dimension actual:
    insert into dimensiones
    select *
    from tmp_unicidad;
    

    【讨论】:

    • 关键在于不使用sumcast,而只是使用select 的直接数学:select count(distinct column) * 1.00 / count(*) * 1.00 as alias
    猜你喜欢
    • 1970-01-01
    • 2021-09-26
    • 1970-01-01
    • 2021-02-24
    • 2017-01-06
    • 1970-01-01
    • 2018-03-03
    • 1970-01-01
    • 2020-01-20
    相关资源
    最近更新 更多