【发布时间】:2019-08-20 20:41:23
【问题描述】:
我已经在 Windows 上安装了 PySpark,直到昨天都没有问题。我正在使用windows 10、PySpark version 2.3.3(Pre-build version)、java version "1.8.0_201"。昨天当我尝试创建一个火花会话时,我遇到了以下错误。
Exception Traceback (most recent call last)
<ipython-input-2-a9ef4ac1a07d> in <module>
----> 1 spark = SparkSession.builder.appName("Hello").master("local").getOrCreate()
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\sql\session.py in getOrCreate(self)
171 for key, value in self._options.items():
172 sparkConf.set(key, value)
--> 173 sc = SparkContext.getOrCreate(sparkConf)
174 # This SparkContext may be an existing one.
175 for key, value in self._options.items():
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in getOrCreate(cls, conf)
361 with SparkContext._lock:
362 if SparkContext._active_spark_context is None:
--> 363 SparkContext(conf=conf or SparkConf())
364 return SparkContext._active_spark_context
365
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
127 " note this option will be removed in Spark 3.0")
128
--> 129 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
130 try:
131 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
310 with SparkContext._lock:
311 if not SparkContext._gateway:
--> 312 SparkContext._gateway = gateway or launch_gateway(conf)
313 SparkContext._jvm = SparkContext._gateway.jvm
314
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\java_gateway.py in launch_gateway(conf)
44 :return: a JVM gateway
45 """
---> 46 return _launch_gateway(conf)
47
48
C:\spark-2.3.3-bin-hadoop2.7\python\pyspark\java_gateway.py in _launch_gateway(conf, insecure)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:
Exception: Java gateway process exited before sending its port number
我确实检查了 github 上的 pyspark 问题以及相同的 stackoverflow 答案,但问题没有解决。
我确实尝试了以下方法:
1.) 尝试卸载、安装和更改 java 安装目录。目前,我的 java 安装目录是 C:/Java/ 。 Pyspark: Exception: Java gateway process exited before sending the driver its port number
2.) 尝试设置 PYSPARK_SUBMIT_ARGS,但没有帮助。
请建议我可能的解决方案。
【问题讨论】:
-
你添加了 winutil.exe 吗? wiki.apache.org/hadoop/WindowsProblems
-
是的,我在文件夹
C:\Hadoop\bin中有winutils.exe和我的HADOOP_HOME = C:\Hadoop -
是的,我确实检查了链接......我面临的问题与安装无关......我能够安装和使用它......问题是在使用 pyspark 之后几天来我突然遇到上述错误,我无法弄清楚如何解决它......我现在无法创建新的 SparkSession 或 SparkContext。
-
是
JAVA_HOME设置吗?
标签: java apache-spark hadoop pyspark apache-spark-standalone