【问题标题】:Spark job using HBase fails使用 HBase 的 Spark 作业失败
【发布时间】:2015-08-17 20:19:04
【问题描述】:

我运行的任何涉及 HBase 访问的 Spark 作业都会导致以下错误。我自己的工作是在 Scala 中,但提供的 python 示例以相同的方式结束。集群是 Cloudera,运行 CDH 5.4.4。相同的作业在使用 CDH 5.3.1 的不同集群上运行良好。

非常感谢任何帮助!

...
15/08/15 21:46:30 WARN TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done.
...
15/08/15 21:46:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, some.server.name): java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:163)
...
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:389)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:158)
... 14 more

【问题讨论】:

    标签: scala hadoop apache-spark hbase cloudera


    【解决方案1】:

    使用以下参数运行 spark-shell: --driver-class-path .../cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar --driver-java-options "-Dspark.executor.extraClassPath=. ../cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar"

    描述了它的工作原理here

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-03-30
      • 1970-01-01
      • 1970-01-01
      • 2017-07-30
      相关资源
      最近更新 更多