【发布时间】:2019-12-15 02:36:13
【问题描述】:
我正在尝试使用 JupyterHub 和 S3 持久性设置 EMR 集群。我有以下分类:
{
"Classification": "jupyter-s3-conf",
"Properties": {
"s3.persistence.enabled": "true",
"s3.persistence.bucket": "my-persistence-bucket"
}
}
我正在通过以下步骤安装dask(否则,打开笔记本会导致500 错误):
command-runner.jar- 参数:
/usr/bin/sudo /usr/bin/docker exec jupyterhub conda install dask
但是,当我打开一个新笔记本时,它并没有被持久化。桶保持空。集群确实可以访问 S3,因为在运行具有相同配置的 Spark 作业时,它可以使用相同的存储桶读取和写入 S3。
但是,在我的 master 上查看 jupyter 日志时,我看到了这个:
[E 2019-08-07 12:27:14.609 SingleUserNotebookApp application:574] Exception while loading config file /etc/jupyter/jupyter_notebook_config.py
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py", line 562, in _load_config_files
config = loader.load_config()
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/loader.py", line 457, in load_config
self._read_file_as_dict()
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
py3compat.execfile(conf_filename, namespace)
File "/opt/conda/lib/python3.6/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
exec(compiler(f.read(), fname, 'exec'), glob, loc)
File "/etc/jupyter/jupyter_notebook_config.py", line 5, in <module>
from s3contents import S3ContentsManager
File "/opt/conda/lib/python3.6/site-packages/s3contents/__init__.py", line 15, in <module>
from .gcsmanager import GCSContentsManager
File "/opt/conda/lib/python3.6/site-packages/s3contents/gcsmanager.py", line 8, in <module>
from s3contents.gcs_fs import GCSFS
File "/opt/conda/lib/python3.6/site-packages/s3contents/gcs_fs.py", line 3, in <module>
import gcsfs
File "/opt/conda/lib/python3.6/site-packages/gcsfs/__init__.py", line 4, in <module>
from .dask_link import register as register_dask
File "/opt/conda/lib/python3.6/site-packages/gcsfs/dask_link.py", line 56, in <module>
register()
File "/opt/conda/lib/python3.6/site-packages/gcsfs/dask_link.py", line 51, in register
dask.bytes.core._filesystems['gcs'] = DaskGCSFileSystem
AttributeError: module 'dask.bytes.core' has no attribute '_filesystems'
我错过了什么,出了什么问题?
【问题讨论】:
-
什么版本的 emr? 5.24 之后没有 dask 运行良好。我现在正在使用它。
-
5.26,不包括 dask 时出现错误 500。
-
更新:我启动了一个“空白”集群,它在那里工作。所以它可能与我的其他库不兼容。
标签: amazon-s3 jupyter-notebook amazon-emr