【问题标题】:Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'找不到值类的序列化程序:'org.apache.hadoop.hbase.client.Result'
【发布时间】:2017-05-25 18:03:47
【问题描述】:

我正在尝试从 HBase 中读取数据并将其保存为一个 sequenceFile,但是得到了

java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.

错误。

我看到了两个类似的帖子:

hadoop writables NotSerializableException with Apache Spark API

Spark HBase Join Error: object not serializable class: org.apache.hadoop.hbase.client.Result

在这两个帖子之后,我注册了三个课程的 Kyro 课程,但仍然没有运气。

这是我的程序:

        String tableName = "validatorTableSample";
        System.out.println("Start indexing hbase: " + tableName);
        SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
        Class[] classes = {org.apache.hadoop.io.LongWritable.class, org.apache.hadoop.io.Text.class, org.apache.hadoop.hbase.client.Result.class};
        sparkConf.registerKryoClasses(classes);
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        Configuration conf = HBaseConfiguration.create();
        conf.set(TableInputFormat.INPUT_TABLE, tableName);
//      conf.setStrings("io.serializations",
//          conf.get("io.serializations"),
//          MutationSerialization.class.getName(),
//          ResultSerialization.class.getName());
        conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");

        JavaPairRDD<ImmutableBytesWritable, Result> hBasePairRDD = sc.newAPIHadoopRDD(
            conf,
            TableInputFormat.class,
            ImmutableBytesWritable.class,
            Result.class);

        hBasePairRDD.saveAsNewAPIHadoopFile("/tmp/tempOutputPath", ImmutableBytesWritable.class, Result.class, SequenceFileOutputFormat.class);
        System.out.println("Finished readFromHbaseAndSaveAsSequenceFile() .........");

这是错误堆栈跟踪:

java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
17/05/25 10:58:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

17/05/25 10:58:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job

【问题讨论】:

  • 您的错误解决了吗?
  • 不,我没有,我仍然面临这个问题,请问有什么线索吗?
  • 我已经发布并回答了..请尝试一次..它对我有用

标签: java hadoop apache-spark serialization hbase


【解决方案1】:

这是使它工作所需的东西

因为我们使用 HBase 来存储我们的数据并且这个 reducer 将其结果输出到 HBase 表中,所以 Hadoop 告诉我们他不知道如何序列化我们的数据。这就是为什么我们需要帮助它。在 setUp 里面设置 io.serializations 变量

conf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

【讨论】:

    【解决方案2】:

    代码已经通过测试

    object HbaseDataExport extends LoggingTime{
      def main(args: Array[String]): Unit = {
        val con = SparkConfig.getProperties()
        val sparkConf = SparkConfig.getSparkConf()
        val sc = SparkContext.getOrCreate(sparkConf)
        val config = HBaseConfiguration.create()
        config.setStrings("io.serializations",
          config.get("io.serializations"),
          "org.apache.hadoop.hbase.mapreduce.MutationSerialization",
          "org.apache.hadoop.hbase.mapreduce.ResultSerialization")
        val path = "/Users/jhTian/Desktop/hbaseTimeData/part-m-00030"
        val path1 = "hdfs://localhost:9000/hbaseTimeData/part-m-00030"
    
        sc.newAPIHadoopFile(path1, classOf[SequenceFileInputFormat[Text, Result]], classOf[Text], classOf[Result], config).foreach(x => {
          import collection.JavaConversions._
          for (i <- x._2.listCells) {
            logger.info(s"family:${Bytes.toString(CellUtil.cloneFamily(i))},qualifier:${Bytes.toString(CellUtil.cloneQualifier(i))},value:${Bytes.toString(CellUtil.cloneValue(i))}")
          }
        })
        sc.stop()
      }
    }
    

    【讨论】:

      猜你喜欢
      • 2017-09-01
      • 1970-01-01
      • 1970-01-01
      • 2021-04-25
      • 1970-01-01
      • 2022-09-24
      • 2021-10-07
      • 2021-03-30
      • 2019-08-04
      相关资源
      最近更新 更多