【问题标题】:Why does Spark fail with "No File System for scheme: local"?为什么 Spark 会因“方案没有文件系统:本地”而失败?
【发布时间】:2020-12-25 12:11:19
【问题描述】:

我正在尝试将 Spark 作业提交到在 AWS EKS 上设置为的 Spark 集群

NAME                            READY   STATUS              RESTARTS   AGE
spark-master-5f98d5-5kdfd       1/1     Running             0          22h
spark-worker-878598b54-jmdcv    1/1     Running             2          3d11h
spark-worker-878598b54-sz6z6    1/1     Running             2          3d11h 

我正在使用下面的清单

apiVersion: batch/v1
kind: Job
metadata:
  name: spark-on-eks
spec:
  template:
    spec:
      containers:
        - name: spark
          image: repo:spark-appv6
          command: [
            "/bin/sh",
            "-c",
            "/opt/spark/bin/spark-submit \
            --master spark://192.XXX.XXX.XXX:7077 \
            --deploy-mode cluster \
            --name spark-app \
            --class com.xx.migration.convert.TestCase \ 
            --conf spark.kubernetes.container.image=repo:spark-appv6
            --conf spark.kubernetes.namespace=spark-pi \
            --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \
            --conf spark.executor.instances=2 \
            local:///opt/spark/examples/jars/testing-jar-with-dependencies.jar"
          ]
      serviceAccountName: spark-pi
      restartPolicy: Never
  backoffLimit: 4

并低于错误日志

20/12/25 10:06:41 INFO Utils: Successfully started service 'driverClient' on port 34511.
20/12/25 10:06:41 INFO TransportClientFactory: Successfully created connection to /192.XXX.XXX.XXX:7077 after 37 ms (0 ms spent in bootstraps)
20/12/25 10:06:41 INFO ClientEndpoint: Driver successfully submitted as driver-20201225100641-0011
20/12/25 10:06:41 INFO ClientEndpoint: ... waiting before polling master for driver state
20/12/25 10:06:46 INFO ClientEndpoint: ... polling master for driver state
20/12/25 10:06:46 INFO ClientEndpoint: State of driver-2020134340641-0011 is ERROR
20/12/25 10:06:46 ERROR ClientEndpoint: Exception from cluster was: java.io.IOException: No FileSystem for scheme: local
java.io.IOException: No FileSystem for scheme: local
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:535)
        at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:166)
        at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:177)
        at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:96)
20/12/25 10:06:46 INFO ShutdownHookManager: Shutdown hook called
20/12/25 10:06:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-d568b819-fe8e-486f-9b6f-741rerf87cf1

此外,当我尝试在没有容器参数的客户端模式下提交作业时,它会成功提交,但作业会继续运行并在工作节点上旋转多个执行器。

Spark 版本 - 3.0.0

使用 k8s://http://Spark-Master-ip:7077 时出现以下错误

20/12/28 06:59:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/28 06:59:12 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/12/28 06:59:12 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
20/12/28 06:59:13 WARN WatchConnectionManager: Exec Failure
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:209)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at okio.Okio$2.read(Okio.java:140)
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)
        at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

请帮忙解决以上要求,谢谢

【问题讨论】:

  • 那是什么spark://192.XXX.XXX.XXX:7077?为什么不k8s://https://...
  • @ItayB 我用同样的方法进行了尝试和测试,我在日志中发现了一些其他错误,我已经更新了我的问题 - 我尝试使用 http 而不是 https - 使用 https 我得到相同但使用 javax.net.ssl .SSLHandshakeException:握手期间远程主机关闭连接

标签: apache-spark kubernetes amazon-eks


【解决方案1】:

假设你使用spark on k8s operator,主人应该是:

k8s://https://kubernetes.default.svc.cluster.local

如果没有,你可以通过运行得到你的主地址:

$ kubectl cluster-info
Kubernetes master is running at https://kubernetes.docker.internal:6443

编辑: 在 spark-on-k8s cluster-mode 中应该提供 k8s://<api_server_host>:<k8s-apiserver-port> (注意添加端口是必须的!)

在 spark-on-k8s 中,“master”(在 spark 中)的角色由 kubernetes 本身扮演 - 它负责分配资源以运行您的驱动程序和工作人员。

【讨论】:

  • 使用上面的 k8s://kubernetes.default.svc.cluster.local (我用我的命名空间 - kubernetes.spark-pi.svc.cluster.local 替换),我得到 java.net.UnknownHostException: kubernetes.spark-pi .svc.cluster.local:名称或服务未知,如果我缺少正确的配置,请告诉我
  • 保留 default - 不是你的命名空间 - 这是与 kubernetes 相关的东西 - 而不是你的应用程序
  • 好的,但从技术上讲,它是如何链接到我的 spark-master pod 的,我在这里遗漏了一些关键点,当我将在我的 spark 中使用 (k8s://kubernetes.default.svc.cluster.local) 时-提交命令我的 spark-master 和 worker 将如何使用?
  • @Zester07 我已经更新了我的答案,如果第一个选项不起作用,请尝试第二个选项。同样,master 地址与您的 spark 命名空间/应用程序无关 - 在 spark-on-k8s 中 - k8s 将为您管理资源,您不需要 spark 主机/容器
  • 好的,我明白了,所以我明白了,我不需要任何 Spark-master/worker 基础设施 (Pods) 来运行 Spark 作业,引擎盖下的 kubernetes 将在集群模式下自行管理执行,1)它会影响任何正在运行的应用程序,它在资源效率方面的效果如何,如果我们比较 spark-operator 方式 2)我还需要明确设置 spark-operator env(Helm 是唯一的方法吗? )
【解决方案2】:

异常的真正原因:

java.io.IOException: No FileSystem for scheme: local

是不是 Spark Standalone 集群的 Worker 想要 downloadUserJar,但根本不认识 local URI 方案。

这是因为 Spark Standalone 不理解它,除非我弄错了,否则支持此 local URI 方案的唯一集群环境是 Spark on YARN 和 Spark on Kubernetes。

这就是您可以通过更改主 URL 来解决此异常的原因。好吧,OP 想要将 Spark 应用程序部署到 Kubernetes(并遵循 Spark on Kubernetes 的规则),而主 URL 是 spark://192.XXX.XXX.XXX:7077,这是用于 Spark Standalone 的。

【讨论】:

    猜你喜欢
    • 2023-04-09
    • 2016-06-25
    • 1970-01-01
    • 2015-12-19
    • 2014-11-14
    • 1970-01-01
    • 2021-02-18
    • 2013-04-08
    • 1970-01-01
    相关资源
    最近更新 更多