【发布时间】:2020-06-01 16:22:21
【问题描述】:
- 创建了一个 conda 环境:
conda create -y -n py38 python=3.8
conda activate py38
- 从 Pip 安装 Spark:
pip install pyspark
# Successfully installed py4j-0.10.7 pyspark-2.4.5
- 尝试导入 pyspark:
python -c "import pyspark"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/context.py", line 31, in <module>
from pyspark import accumulators
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/serializers.py", line 72, in <module>
from pyspark import cloudpickle
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "/Users/dmitrii_deriabin/anaconda3/envs/py38/lib/python3.8/site-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
似乎 Pyspark 附带了 cloudpickle 包的预打包版本,在 Python 3.8 上存在一些问题,现在在 pip 版本上已解决(至少从版本 1.3.0 开始),但是 Pyspark 版本仍然损坏.有没有人遇到过同样的问题/有幸解决了这个问题?
【问题讨论】:
-
Spark 在 3.0.0 之前不支持 Python 3.8
-
@blackbishop,不,不幸的是它没有,因为降级不是我的用例的选项。
-
@cricket_007 看到这个issue
-
@Dmitry 为什么不呢?看起来你正在创建自己的环境,所以如果你想使用 pyspark 就必须这样做
标签: apache-spark pyspark python-3.8