【发布时间】:2020-07-27 16:35:11
【问题描述】:
我在使用 spark-submit 提交我的 spark 作业时遇到 java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders 异常
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.ProviderUtils.excludeIncompatibleCredentialProviders(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/Class;)Lorg/apache/hadoop/conf/Configuration;
at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:740)
at org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider.<init>(SimpleAWSCredentialsProvider.java:58)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:600)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
已安装 Spark 版本:2.4.5
我的配置:build.gradle:
buildscript {
repositories {
maven {
url "https://*********/****/content/repositories/thirdparty"
credentials {
username ****User
password ****Pwd
}
}
}
}
plugins {
id 'java'
id 'com.github.johnrengelman.shadow' version '1.2.3'
}
group 'com.felix'
version '1.0-SNAPSHOT'
sourceCompatibility = 1.8
repositories {
mavenCentral()
mavenLocal()
}
dependencies {
compile group: 'org.apache.spark', name: 'spark-hadoop-cloud_2.11', version: '2.4.2.3.1.3.0-79'
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
compileOnly group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind
compileOnly group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.7.3'
// https://mvnrepository.com/artifact/org.apache.parquet/parquet-column
compileOnly group: 'org.apache.parquet', name: 'parquet-column', version: '1.10.1'
// https://mvnrepository.com/artifact/org.apache.parquet/parquet-hadoop
compileOnly group: 'org.apache.parquet', name: 'parquet-hadoop', version: '1.10.1'
// https://mvnrepository.com/artifact/org.apache.spark/spark-sketch
compileOnly group: 'org.apache.spark', name: 'spark-sketch_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
compileOnly group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-catalyst
compileOnly group: 'org.apache.spark', name: 'spark-catalyst_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-tags
compileOnly group: 'org.apache.spark', name: 'spark-tags_2.11', version: '2.4.5'
compileOnly group: 'org.apache.spark', name: 'spark-avro_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
compileOnly group: 'org.apache.spark', name: 'spark-hive_2.11', version: '2.4.5'
// https://mvnrepository.com/artifact/org.apache.xbean/xbean-asm6-shaded
compile group: 'org.apache.xbean', name: 'xbean-asm7-shaded', version: '4.15'
// https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler
compileOnly group: 'org.codehaus.janino', name: 'commons-compiler', version: '3.0.9'
// https://mvnrepository.com/artifact/org.codehaus.janino/janino
compileOnly group: 'org.codehaus.janino', name: 'janino', version: '3.0.9'
//HIVE Metastore
compile group: 'org.postgresql', name: 'postgresql', version: '42.2.9'
compile 'com.google.guava:guava:22.0'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.2.1'
compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.2.1'
compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.2.1'
compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271'
compile group: 'io.delta', name: 'delta-core_2.11', version: '0.5.0'
compile group: 'joda-time', name: 'joda-time', version: '2.10.5'
compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6'
}
shadowJar {
zip64 true
}
Spark 作业:
df.distinct()
.withColumn("date", date_format(col(EFFECTIVE_PERIOD_START), "yyyy-MM-dd"))
.repartition(col("date"))
.write()
.format(fileFormat)
.partitionBy("date")
.mode(SaveMode.Append)
.option("fs.s3a.committer.name", "partitioned")
.option("fs.s3a.committer.staging.conflict-mode", "append")
.option("spark.sql.sources.commitProtocolClass", "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")
.option("spark.sql.parquet.output.committer.class", "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter")
.option("compression", compressionCodecName.name().toLowerCase())
.save(DOWNLOADS_NON_COMPACT_PATH);
执行的脚本: spark-submit --class com.felix.DataIngestionApplication --master local DataIngestion-1.0-SNAPSHOT-all.jar
据我了解,hadoop 版本正在创建问题All the hadoop-* JARs need to be 100% matching on versions。所以我确保所有 org.apache.hadoop 依赖项都是相同的版本(3.2.1)。但它仍然给出了这个错误。
我想使用 hadoop 版本 3 或更新版本,因为它提供了更新的 S3A 提交者,例如“PartitionedStagingCommitter”。 Spark 2.4.5 每个人都如何使用它?
如何强制/覆盖 hadoop 版本以用作 3.2.1 而不是 Spark/jars 中的 hadoop 版本? 当我查看 /usr/local/Cellar/apache-spark /2.4.5/libexec/jars/ 我可以看到 hadoop-common-2.7.3.jar、hadoop-client-2.7.3.jar 等 那么我们如何强制使用 hadoop较新的版本,因此我可以利用新的 S3A 提交者。?
注意:如果我不使用 spark-submit 而是从 IntelliJ 运行应用程序,并且所有依赖项都作为 compile,那么应用程序将启动并毫无例外地执行。我可以看到数据被插入到 S3 中。
【问题讨论】:
标签: java apache-spark hadoop amazon-s3