【发布时间】:2018-08-16 08:54:12
【问题描述】:
我想在 s3 上保存和加载机器学习模型。
我做到了:
val credentials = new ProfileCredentialsProvider()
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", credentials.getCredentials.getAWSAccessKeyId)
hadoopConf.set("fs.s3.awsSecretAccessKey", credentials.getCredentials.getAWSSecretKey)
TrainValidationSplitModel.load(s"s3://model_path")
当我在本地运行它时它正在工作。
但是,当我在集群中运行它时,出现以下错误:
Serialization trace:
fields (org.apache.spark.sql.types.StructType)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:366)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:307)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:312)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:324)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.types.StructField[]
Note: To register this class use: kryo.register(org.apache.spark.sql.types.StructField[].class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:76)
... 10 more
您可能会说:“简单,您只需使用 kryo.register(SomeClass.class); 注册类 org.apache.spark.sql.types.StructField;”
但是,在注册了将近 15 节课之后。 Kryo 要求我注册一个私有类(访问权限仅限于 spark 包)。
我该如何解决这个问题?
【问题讨论】:
标签: scala apache-spark amazon-s3 kryo