【问题标题】:presto failed to search data from hive [closed]presto 无法从 hive 中搜索数据 [关闭]
【发布时间】:2021-03-02 14:39:18
【问题描述】:

我遇到了一个问题,我无法从 hive 中获取数据,而 hive 数据来自 spark。

io.prestosql.spi.PrestoException: Cannot get bucket number from path: hdfs://xxx:8020/warehouse/tablespace/managed/hive/ods_mflex_bpm_szgx.db/workflow_requestbase/year=2018/part-00000-74647672-c3b8-4b36-98d3-95734e8bd376.c000.snappy.orc
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:257)
    at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
    at io.prestosql.$gen.Presto_344____20201118_122905_2.run(Unknown Source)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Cannot get bucket number from path: hdfs://xxxx:8020/warehouse/tablespace/managed/hive/ods_mflex_bpm_szgx.db/workflow_requestbase/year=2018/part-00000-74647672-c3b8-4b36-98d3-95734e8bd376.c000.snappy.orc
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.lambda$getRequiredBucketNumber$9(BackgroundHiveSplitLoader.java:733)
    at java.base/java.util.OptionalInt.orElseThrow(OptionalInt.java:271)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.getRequiredBucketNumber(BackgroundHiveSplitLoader.java:733)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:511)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:321)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:250)
    ... 6 more

有人知道原因吗?

【问题讨论】:

  • 我想你的表定义有分桶(集群)属性,但数据没有。您可以更改表定义。

标签: apache-spark hive presto trino


【解决方案1】:

该表在 Hive 元存储中被声明为分桶,但实际文件没有分桶。您需要修复表声明以使其不被存储。我认为您需要为此使用 Hive CLI。

请注意,即使 Spark 填充了对文件进行分桶的表,由于https://issues.apache.org/jira/browse/SPARK-19256,它也会导致不正确的查询结果。我们将检测到这一点并防止https://github.com/trinodb/trino/pull/6012中出现错误的查询结果

【讨论】:

    猜你喜欢
    • 2018-10-12
    • 2018-03-10
    • 1970-01-01
    • 2018-04-26
    • 1970-01-01
    • 2014-02-02
    • 1970-01-01
    • 2021-07-27
    • 2020-04-16
    相关资源
    最近更新 更多