【发布时间】:2019-03-20 05:27:48
【问题描述】:
我按照 AWS 文档创建了一个具有以下配置的 EMR 集群
https://aws.amazon.com/premiumsupport/knowledge-center/emr-pyspark-python-3x/
{
"Classification": "livy-conf",
"Properties": {
"livy.spark.deploy-mode": "cluster",
"livy.impersonation.enabled": "true",
"livy.spark.yarn.appMasterEnv.PYSPARK_PYTHON": "/usr/bin/python3"
}
},
当我通过以下发布请求使用 livy 提交 pyspark 作业时
```
payload = {
'file': self.py_file,
'pyFiles': self.py_files,
'name': self.job_name,
'archives': ['s3://test.test.bucket/venv.zip#venv', 's3://test.test.bucket/requirements.pip'],
'proxyUser': 'hadoop',
"conf": {
"PYSPARK_PYTHON": "./venv/bin/python",
"spark.yarn.appMasterEnv.PYSPARK_PYTHON": "./venv/bin/python",
"spark.yarn.executorEnv.PYSPARK_PYTHON": "./venv/bin/python",
"spark.yarn.appMasterEnv.VIRTUAL_ENV": "./venv/bin/python",
"spark.yarn.executorEnv.VIRTUAL_ENV": "./venv/bin/python",
"livy.spark.yarn.appMasterEnv.PYSPARK_PYTHON": "./venv/bin/python",
"livy.spark.yarn.appMasterEnv.PYSPARK_PYTHON": "./venv/bin/python",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type": "native",
"spark.pyspark.virtualenv.requirements": "s3://test.test.bucket/requirements.pip",
"spark.pyspark.virtualenv.path": "./venv/bin/python"
}
}
```
我收到以下错误消息:
```
Log Type: stdout
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Current thread 0x00007efc72b57740 (most recent call first)
```
我也试过在虚拟环境下把PYTHONHOMEPYTHONPATH改成python的bin文件的父文件夹,但是没有用。
```
"spark.yarn.appMasterEnv.PYTHONPATH": "./venv/bin/",
"spark.yarn.executorEnv.PYTHONPATH": "./venv/bin/",
"livy.spark.yarn.appMasterEnv.PYTHONPATH": "./venv/bin/",
"livy.spark.yarn.executorEnv.PYTHONPATH": "./venv/bin/",
#
"spark.yarn.appMasterEnv.PYTHONHOME": "./venv/bin/",
"spark.yarn.executorEnv.PYTHONHOME": "./venv/bin/",
"livy.spark.yarn.appMasterEnv.PYTHONHOME": "./venv/bin/",
"livy.spark.yarn.executorEnv.PYTHONHOME": "./venv/bin/",
```
错误:
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Current thread 0x00007f7351d53740 (most recent call first):
这就是我创建虚拟环境的方式
```
python3 -m venv venv/
source venv/bin/activate
python3 -m pip install -r requirements.pip
deactivate
pushd venv/
zip -rq ../venv.zip *
popd
```
虚拟环境结构:
drwxrwxr-x 2 4096 Oct 15 12:37 bin/
drwxrwxr-x 2 4096 Oct 15 12:37 include/
drwxrwxr-x 3 4096 Oct 15 12:37 lib/
lrwxrwxrwx 1 3 Oct 15 12:37 lib64 -> lib/
-rw-rw-r-- 1 59 Oct 15 12:37 pip-selfcheck.json
-rw-rw-r-- 1 69 Oct 15 12:37 pyvenv.cfg
drwxrwxr-x 3 4096 Oct 15 12:37 share/
bin 目录:
activate activate.csh activate.fish chardetect easy_install easy_install-3.5 pip pip3 pip3.5 python python3
库目录:
python3.5/site-packages/
Aws 支持说这是一个持续存在的错误。
https://issues.apache.org/jira/browse/SPARK-13587
https://issues.apache.org/jira/browse/ZEPPELIN-2233
有什么建议吗?
谢谢!
【问题讨论】:
-
您使用的是哪个 spark 版本?
标签: python-3.x amazon-web-services apache-spark virtualenv amazon-emr