【发布时间】:2021-01-29 11:56:05
【问题描述】:
我正在 GCP Dataproc 中从 spark 2.4.7 升级到 spark 3.1。我正在做sqoop import 并将数据加载到 Parquet 文件中。该代码在 Spark 2.4.7 版本上运行良好,但在 Spark 3.1 中出现以下错误。
2021-01-29 10:57:25,383 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode
org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode
at org.apache.avro.util.internal.JacksonUtils.toJson(JacksonUtils.java:87)
at org.apache.avro.util.internal.JacksonUtils.toJsonNode(JacksonUtils.java:48)
at org.apache.avro.Schema$Field.<init>(Schema.java:558)
at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:100)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.generateAvroSchema(DataDrivenImportJob.java:131)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:116)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:747)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:536)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:633)
at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:233)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:242)
at org.apache.sqoop.Sqoop.main(Sqoop.java:251)
我尝试将 Sqoop 依赖 jar 从旧版本替换为新版本,但问题仍然存在。我无法找到解决此问题的方法。
安装 Sqoop 1.5.0-SNAPSHOT 版本时是否存在 GCP Dataproc 依赖问题?
【问题讨论】:
标签: java avro sqoop google-cloud-dataproc