【问题标题】:Create Athena table from nested json source从嵌套的 json 源创建 Athena 表
【发布时间】:2019-11-17 20:44:28
【问题描述】:

如何从嵌套的 json 文件创建 Athena 表?这是我的示例 json 文件。我只需要选择的键值对,例如roofcondition 和garagestalls。

{   "reportId":"7bc7fa76-bf53-4c21-85d6-118f6a8f4244",
"reportOrderedTS":"1529996028730",
"createdTS":"1530304910154",
"report":"{'summaryElements': [{'value': 'GOOD', 'key': 'roofCondition'}, 
{'value': '98', 'key': 'storiesConfidence'}{'value': '0', 'key': 
'garageStalls'}], 'elements': [{'source': 'xyz', 'imageId': '0xxx_png', 
'modelVersion': '1.21.0', 'key': 'pool'},  {'source': 'xyz', 'imageId': '0111_png', 'value': 'GOOD', 'modelVersion': '1.36.0', 'key': 'roofCondition','confidence': '49'}],    }", "status":"Success", "reportReceivedTS":"1529996033830" }

【问题讨论】:

    标签: amazon-athena


    【解决方案1】:

    首先你发送的 JSON 文档版本错误,正确的版本应该是这样的:

    {"reportId":"7bc7fa76-bf53-4c21-85d6-118f6a8f4244", "reportOrderedTS":"1529996028730", "createdTS":"1530304910154", "report":{"summaryElements": [{"value": "GOOD", "key": "roofCondition"},{"value": "98", "key": "storiesConfidence"},{"value": "0", "key": "garageStalls"}], "elements": [{"source": "xyz", "imageId": "0xxx_png", "modelVersion": "1.21.0", "key": "pool"},{"source": "xyz", "imageId": "0111_png", "value": "GOOD", "modelVersion": "1.36.0", "key": "roofCondition", "confidence": "49"}] }, "status":"Success", "reportReceivedTS":"1529996033830" }
    

    是的,您可以使用嵌套 json 查询 Athena 上的表。例如,您可以通过创建下表来实现这一点:

    CREATE EXTERNAL TABLE example(
    `reportId` string,
    `reportOrderedTS` bigint,
    `createdTS` bigint,
    `report` struct<
    `summaryElements`: array<struct<`value`:string, `key`: string>>,
    `elements`: array<struct<`source`: string, `imageId`:string, `modelVersion`:string, `key`:string, `value`:string,  `confidence`:int>>>, 
    `status` string, 
    `reportReceivedTS` bigint
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION 's3://example'  
    

    这是示例查询:

    select reportid,reportorderedts,createdts,
    summaryelements.value, summaryelements.key, elements.source, elements.key
    from example, UNNEST(report.summaryelements) t(summaryelements), UNNEST(report.elements) t(elements)
    

    有用的链接:

    https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html

    https://docs.aws.amazon.com/athena/latest/ug/rows-and-structs.html

    【讨论】:

      【解决方案2】:

      所以这似乎也有效(不是有效的 json )!

      表格的每一个raw都是json文件中的一行。

      行尾没有空格和逗号(只是表格原始数据之间的新行)。

       {"is_active":"True","title":"mr","first_name":"admindoc","last_name":"admindoc","birthdate":"2003-09-01","home_phone":"+654654","mobile_phone":"+654654","gender":"m","language":"fr","email":"xxx+admine@sinnovation.com"}
       {"is_active":"True","title":"mr","first_name":"dok","last_name":"dok","birthdate":"1998-02-03","home_phone":"None","mobile_phone":"+654654","gender":"m","language":"fr","email":"xxx+docteur@sinnovation.com"}
      

      【讨论】:

        猜你喜欢
        • 2021-10-02
        • 1970-01-01
        • 2021-12-02
        • 2020-11-10
        • 1970-01-01
        • 2013-06-07
        • 2020-06-25
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多