【问题标题】:Attach aws emr cluster to remote jupyter notebook using sparkmagic使用 sparkmagic 将 aws emr 集群附加到远程 jupyter notebook
【发布时间】:2021-04-06 21:38:35
【问题描述】:

我正在尝试将 AWS EMR 集群 (emr-5.29.0) 连接并附加到我在本地 Windows 机器上工作的 Jupyter 笔记本。我已经使用 Hive 2.3.6、Pig 0.17.0、Hue 4.4.0、Livy 0.6.0、Spark 2.4.4 启动了一个集群,并且子网是公共的。我发现这可以通过Azure HDInsight 完成,因此希望可以使用 EMR 完成类似的操作。我遇到的问题是在 config.json 文件中传递正确的值。我应该如何附加 EMR 集群?

我可以在 AWS 原生的 EMR 笔记本上工作,但我认为我可以走本地开发路线并遇到了障碍。

{
    "kernel_python_credentials" : {
      "username": "{IAM ACCESS KEY ID}", # not sure about the username for the cluster
      "password": "{IAM SECRET ACCESS KEY}", # I use putty to ssh into the cluster with the pem key, so again not sure about the password for the cluster
      "url": "ec2-xx-xxx-x-xxx.us-west-2.compute.amazonaws.com", # as per the AWS blog When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy
      "auth": "None"
    },
  
    "kernel_scala_credentials" : {
      "username": "{IAM ACCESS KEY ID}",
      "password": "{IAM SECRET ACCESS KEY}",
      "url": "{Master public DNS}",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "{}",
      "password": "{}",
      "url": "{}"
    },

2021 年 1 月 4 日更新

在 4 月 1 日,我使用 sparkmagic 在本地的 jupyter notebook 上工作。使用这些文档作为参考(ref-1ref-2ref-3)来设置本地端口转发(如果可能,请避免使用 sudo)。

 sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

配置详情 发布标签:emr-5.32.0 Hadoop 发行版:Amazon 2.10.1 应用:Hive 2.3.7、Livy 0.7.0、JupyterHub 1.1.0、Spark 2.4.7、Zeppelin 0.8.2

更新配置文件

{
    "kernel_python_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "kernel_scala_credentials" : {
      "username": "",
      "password": "",
      "url": "http://localhost:8998",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "",
      "password": "",
      "url": "http://localhost:8998"
    },
  
    "logging_config": {
      "version": 1,
      "formatters": {
        "magicsFormatter": { 
          "format": "%(asctime)s\t%(levelname)s\t%(message)s",
          "datefmt": ""
        }
      },
      "handlers": {
        "magicsHandler": { 
          "class": "hdijupyterutils.filehandler.MagicsFileHandler",
          "formatter": "magicsFormatter",
          "home_path": "~/.sparkmagic"
        }
      },
      "loggers": {
        "magicsLogger": { 
          "handlers": ["magicsHandler"],
          "level": "DEBUG",
          "propagate": 0
        }
      }
    },
    "authenticators": {
      "Kerberos": "sparkmagic.auth.kerberos.Kerberos",
      "None": "sparkmagic.auth.customauth.Authenticator", 
      "Basic_Access": "sparkmagic.auth.basic.Basic"
    },
  
    "wait_for_idle_timeout_seconds": 15,
    "livy_session_startup_timeout_seconds": 60,
  
    "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",
  
    "ignore_ssl_errors": false,
  
    "session_configs": {
      "driverMemory": "1000M",
      "executorCores": 2
    },
  
    "use_auto_viz": true,
    "coerce_dataframe": true,
    "max_results_sql": 2500,
    "pyspark_dataframe_encoding": "utf-8",
    
    "heartbeat_refresh_seconds": 5,
    "livy_server_heartbeat_timeout_seconds": 60,
    "heartbeat_retry_seconds": 1,
  
    "server_extension_default_kernel_name": "pysparkkernel",
    "custom_headers": {},
    
    "retry_policy": "configurable",
    "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
    "configurable_retry_policy_max_retries": 8
  }

第二次更新 1/9

回到第一格。不断收到此错误并花了几天时间尝试调试。不知道我以前做了什么来让事情顺利进行。还检查了我的安全组配置,它看起来很好,端口 22 上的 ssh。

An error was encountered:
Error sending http request and maximum retry encountered.

【问题讨论】:

标签: python-3.x apache-spark pyspark amazon-emr jupyter-lab


【解决方案1】:

在端口 8998 上创建了到 livy 服务器的本地端口转发(ssh 隧道),它就像魔术一样工作。

sudo ssh -i ~/aws-key/my-pem-file.pem -N -L 8998:ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8998 hadoop@ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com

没有从 1/4 更新更改我的 config.json 文件

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-27
    • 1970-01-01
    • 1970-01-01
    • 2020-09-08
    • 2020-10-31
    • 1970-01-01
    相关资源
    最近更新 更多