【问题标题】:Creating hive table over complex parquet file在复杂的镶木地板文件上创建配置单元表
【发布时间】:2015-05-18 23:44:56
【问题描述】:

我正在尝试将蜂巢表放在我根据以下 json 内容创建的镶木地板上:
{"user_id":"4513","providers":[{"id":"4220","name":"dbmvl","behaviors":{"b1":"gxybq","b2":"ntfmx "}},{"id":"4173","name":"dvjke","behaviors":{"b1":"sizow","b2":"knuuc"}}]}

{"user_id":"3960","providers":[{"id":"1859","name":"ponsv","behaviors":{"b1":"ahfgc", "b2":"txpea"}},{"id":"103","name":"uhqqo","行为":{"b1":"lktyo","b2":"ituxy"}}] }

{"user_id":"567","providers":[{"id":"9622","name":"crjju","behaviors":{"b1":"rhaqc", "b2":"npnot"}},{"id":"6965","name":"fnheh","行为":{"b1":"eipse","b2":"nvxqk"}}] }

我基本上使用 spark sql 来读取 json 并写出 parquet 文件。

我遇到了将 hive 放在生成的 parquet 文件顶部的问题。这是我的蜂巢 hql:
create table test (mycol STRUCT<user_id:String, providers:ARRAY<STRUCT<id:String, name:String, behaviors:MAP<String, String>>>>) stored as parquet; Alter table test set location 'hdfs:///tmp/test.parquet'; 上述语句执行良好,但当我尝试在表上执行 select * 时出现错误:
失败并出现异常 java.io.IOException:java.lang.IllegalStateException: {providers=providers, user_id=user_id} 中不存在索引 0 处的列 mycol

【问题讨论】:

    标签: json hive apache-spark apache-spark-sql parquet


    【解决方案1】:

    尝试将您的查询更改为:

    create table test (user_id:String, providers:ARRAY<STRUCT<id:String, name:String, behaviors:MAP<String, String>>>) stored as parquet;
    

    存储 Parquet 文件时,根 JSON 对象被展平。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-12-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-15
      • 2018-08-02
      相关资源
      最近更新 更多