【问题标题】:Pycharm: Java gateway process exited before sending its port numberPycharm:Java网关进程在发送其端口号之前退出
【发布时间】:2019-01-23 16:01:12
【问题描述】:

我正在尝试使用自包含的 sparks 应用程序在 python 中执行(使用 Pycharm)一些 examples

我使用以下方式安装了 pyspark:

pip install pyspark 

根据示例的网络,它应该足以执行它:

python nameofthefile.py

但我有这个错误:

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
    at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
    at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:359)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:366)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
    ... 23 more
Traceback (most recent call last):
  File "C:/Users/.../PycharmProjects/PoC/Databricks.py", line 4, in <module>
    spark = SparkSession.builder.appName("Databricks").getOrCreate()
  File "C:\Users\...\Desktop\env\lib\site-packages\pyspark\sql\session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "C:\Users\...\Desktop\env\lib\site-packages\pyspark\context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "C:\Users\...\Desktop\env\lib\site-packages\pyspark\context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "C:\Users\...\Desktop\env\lib\site-packages\pyspark\context.py", line 298, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "C:\Users\...\Desktop\env\lib\site-packages\pyspark\java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

可能出了什么问题?


额外

根据您可以找到解决方案的帖子,对于我的情况,我必须从 jdk-11 更改为 jdk1.8。

现在我可以运行示例代码,但出现错误(不会阻止它运行)

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
    at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
    at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:359)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:359)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:366)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-01-24 08:46:16 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Here是这个Could not locate executable null\bin\winutils.exe的解决方案

恢复,要解决第二个问题,您只需要在控制面板中定义 HADOOP_HOME 和 PATH 环境变量,以便任何 Windows 程序都可以使用它们。

【问题讨论】:

    标签: python pyspark pycharm


    【解决方案1】:

    简答:

    我遇到了类似的问题,我通过更改我的 JAVA_HOME 环境变量配置解决了这个问题。 您可以手动添加一个新的用户环境变量 JAVA_HOME 链接到您的 Java 开发工具包的路径(“C:/Progra~1/Java/jdk1.8.0_121”或“C:/Progra~2/Java/jdk1” .8.0_121”,如果它安装在 Windows 上的“Program Files (x86)”中)。

    您也可以在 python 代码的开头尝试这样的操作:

    import os
    os.environ["JAVA_HOME"] = "C:/Progra~1/Java/jdk1.8.0_121"
    

    (或者,如果您的 JDK 安装在“Program Files (x86)”下,则为“C:/Progra~2/Java/jdk1.8.0_121”


    更长的答案: 独立于 Pyspark,您是否安装了 Spark 二进制文件(包括 hadoop)? 您还需要安装兼容的 Java 开发工具包 (JDK)(来自 Spark 2.3.0 的 java 8+)。 您还需要配置用户环境变量,例如: JAVA_HOME 与 java 开发工具包的路径 SPARK_HOME 带有 SPARK 二进制文件的路径 HADOOP_HOME 与 hadoop 二进制文件的路径

    你可以从 python 做这样的事情:

    import os
    os.environ["JAVA_HOME"] = "C:/Progra~2/Java/jdk1.8.0_121"
    os.environ["SPARK_HOME"] = "/path/to/spark-2.3.1-bin-hadoop2.7"
    

    然后我建议使用 findspark(你可以安装它 pip install findspark):https://github.com/minrk/findspark

    然后你可以像这样使用它:

    import findspark
    findspark.init()
    from pyspark.sql import SparkSession
    
    spark = SparkSession.builder.master("local[*]").getOrCreate()
    

    特别是如果你在 Windows 上,JAVA_HOME 应该是这样的:

    C:\Progra~1\Java\jdk1.8.0_121
    

    并且,“如果 JDK 安装在 \Program Files (x86) 下,则将 Progra~1 部分替换为 Progra~2。”

    windows上的安装细节可以看这里(是jupyter但spark和pyspark的安装是一样的): https://changhsinlee.com/install-pyspark-windows-jupyter/

    希望对你有帮助 祝你好运,祝你有美好的一天/晚上!

    【讨论】:

    • 谢谢!我“遵循”捷径,问题是我使用的是 jdk-11 而不是 jdk 1-8。现在我可以执行它,我得到了示例的结果。我也有一个例外(这不会阻止它运行),但我不确定您的长答案是否有任何步骤。我已附上主帖
    猜你喜欢
    • 2019-06-08
    • 2022-08-09
    • 2021-09-15
    • 2019-08-13
    • 2020-10-26
    • 1970-01-01
    • 2015-10-27
    • 2015-10-28
    • 2017-07-15
    相关资源
    最近更新 更多