【问题标题】:Running multiple sql queries in hive/impala for testing pass or fail在 hive/impala 中运行多个 sql 查询以测试通过或失败
【发布时间】:2020-02-26 23:11:47
【问题描述】:

我正在运行 100 个查询(测试用例)来检查 hive/impala 中的数据质量。大多数查询会根据某些条件检查空值。我正在使用条件聚合来计算如下所示的琐碎测试用例。我想为这种类型的检查添加更复杂的查询条件。如果有空值,我也想查看计数。

我想知道如何合并更复杂的查询,并在存在空值时添加计数。预期输出如下。

到目前为止我所拥有的:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test       
FROM car_data;

要添加更复杂的类型查询:

SELECT Count(*), 
       car_job 
FROM   car_data 
WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
GROUP  BY car_job

预期输出示例:

car_type_test  car_color_test  car_sale_test  car_job_test
PASS           PASS             PASS           FAIL
                                               102

【问题讨论】:

  • 没有问题
  • 我在编辑中澄清了。

标签: sql hive hiveql impala conditional-aggregation


【解决方案1】:

我建议把它放在一排而不是两排:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
        END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
        END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
        END) as car_sale_test       
FROM car_data;

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-12-21
    • 1970-01-01
    • 1970-01-01
    • 2023-03-18
    • 1970-01-01
    • 2023-03-29
    • 1970-01-01
    相关资源
    最近更新 更多