【发布时间】:2021-12-03 03:35:38
【问题描述】:
我正在尝试使用 s3a 从 S3 加载数据(据我所知,这是目前唯一的选择)。我收到一个错误(java.lang.NoClassDefFoundError: org/apache/hadoop/fs/statistics/IOStatisticsSource),我在网上找不到任何信息。在配置使用 s3 方面,我已经做了所有我能想到的事情,但这个错误似乎很少见。
如果有人能指出正确的方向,我将不胜感激。
这是堆栈跟踪:
Traceback (most recent call last):
File "/home/hdoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 737, in csv
File "/home/hdoop/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/hdoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/home/hdoop/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o107.csv.
: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/statistics/IOStatisticsSource
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2532)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2497)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2593)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3269)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:795)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.statistics.IOStatisticsSource
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
【问题讨论】:
-
您使用的是什么 Spark 版本?
-
这可能已经被回答了。 Check it out.
-
@mazaneicha Spark 版本 3.1.2
-
你的hadoop-common也是3.3.1? issues.apache.org/jira/browse/HADOOP-17450
-
为什么你有3.3.1和3.2.0?请参考@MattAndruff 的评论。
标签: apache-spark hadoop amazon-s3 pyspark