【问题标题】:Extract array items as a view - AWS Athena将数组项提取为视图 - AWS Athena
【发布时间】:2022-01-26 05:29:06
【问题描述】:

我正在尝试从存在于 Athena 的某个字段中的数组中选择所有元素,例如:

{
    id: "1",
    name: "bla",
    array: [{
        val1: "2",
        val2: "2"
    }, {
        val1: "3",
        val2: "4"
    }]
}


{
    id: "3",
    name: "bla bla",
    array: [{
        val1: "5",
        val2: "6"
    }, {
        val1: "7",
        val2: "8"
    }]
}

我正在尝试创建一个从内部数组中选择所有元素的视图,结果将是:

+----+------+------+
| id | val1 | val2 |
+----+------+------+
| 1  | 2    | 2    |
+----+------+------+
| 1  | 3    | 4    |
+----+------+------+
| 2  | 5    | 6    |
+----+------+------+
| 2  | 7    | 8    |
+----+------+------+

产生这种输出的查询是什么?

实际文件将是每行一项,如下所示:

{ 编号:“1”, 名称:“布拉”, 大批: [{ val1:“2”, val2:“2” }, { val1:“3”, val2:“4” }] }

{ 编号:“3”, 名称:“bla bla”, 大批: [{ val1:“5”, val2:“6” }, { val1:“7”, val2:“8” }] }

创建表的 DDL 如下所示:

CREATE EXTERNAL TABLE all (
  id STRING,
  name STRING,
  array ARRAY<
              struct<
               val1:STRING,
               val2:STRING
           >    >          
  )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://abc/def'

【问题讨论】:

    标签: amazon-athena


    【解决方案1】:

    通过以下步骤,我能够达到预期的效果:

    1. 更正了来自
    2. 的 JSON 记录
    { id: "1", name: "bla", array: [{ val1: "2", val2: "2" }, { val1: "3", val2: "4" }] }
    

    {
      "id": "1",
      "name": "bla",
      "array": [
        {
          "val1": "2",
          "val2": "2"
        },
        {
          "val1": "3",
          "val2": "4"
        }
      ]
    }
    
    1. 在 Athena 中创建了一个具有以下定义的表:
    CREATE EXTERNAL TABLE testt_json2( `id` string COMMENT 'from deserializer', `name` string COMMENT 'from deserializer', `array` array<struct<val1:string,val2:string>> COMMENT 'from deserializer') 
        ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
        WITH SERDEPROPERTIES ('paths'='array,id,name') 
        STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
        LOCATION 's3://test/'
    
    1. 然后使用 unnest 运行查询以展平数组,这给了我预期的结果
    WITH dataset AS (
        SELECT *
        FROM testt_json2
    )
    SELECT id,
        t.names.val1,
        t.names.val2
    FROM dataset
        CROSS JOIN UNNEST(array) AS t(names)
    

    【讨论】:

      猜你喜欢
      • 2021-10-18
      • 2020-09-10
      • 2019-05-05
      • 1970-01-01
      • 2020-12-18
      • 2018-08-24
      • 1970-01-01
      • 2020-01-21
      • 2021-02-18
      相关资源
      最近更新 更多