使用 Yarn-client 在 Google Cloud 上的 Hadoop 中运行 JAR答案

【问题标题】：Running JAR in Hadoop on Google Cloud using Yarn-client使用 Yarn-client 在 Google Cloud 上的 Hadoop 中运行 JAR
【发布时间】：2015-06-17 10:20:58
【问题描述】：

我想使用 Yarn-client 在 Google Cloud 上的 Hadoop 中运行 JAR。

我在hadoop的主节点使用这个命令

spark-submit --class find --master yarn-client find.jar

但它返回此错误

    15/06/17 10:11:06 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m-on8g/10.240.180.15:8032
15/06/17 10:11:07 INFO ipc.Client: Retrying connect to server: hadoop-m-on8g/10.240.180.15:8032. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

有什么问题？如果有用，这是我的 yarn-site.xml

<?xml version="1.0" ?>
<!--
     <configuration>
      <!-- Site specific YARN configuration properties -->
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/yarn-logs/</value>
        <description>
          The remote path, on the default FS, to store logs.
        </description>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-m-on8g</value>
      </property>
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>5999</value>
        <description>

【问题讨论】：

标签： hadoop apache-spark google-compute-engine hadoop-yarn

【解决方案1】：

在您的情况下，YARN ResourceManager 可能由于未知原因而运行不正常；您可以尝试使用以下方法修复纱线：

sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/stop-yarn.sh
sudo sudo -u hadoop /home/hadoop/hadoop-install/sbin/start-yarn.sh

但是，您似乎正在使用 Click-to-Deploy 解决方案； Click-to-Deploy 的 Spark + Hadoop 2 部署目前实际上不支持 YARN 上的 Spark，因为存在一些错误和缺少内存配置。如果您尝试使用开箱即用的--master yarn-client 运行它，通常会遇到这样的情况：

15/06/17 17:21:08 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:09 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: -1
   appStartTime: 1434561664937
   yarnAppState: ACCEPTED

15/06/17 17:21:10 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
   appMasterRpcPort: 0
   appStartTime: 1434561664937
   yarnAppState: RUNNING

15/06/17 17:21:15 ERROR cluster.YarnClientSchedulerBackend: Yarn application already ended: FAILED
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
15/06/17 17:21:15 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

得到很好支持的部署方式是在 Google Compute Engine 上使用 Hadoop 2 和配置为能够在 YARN 上运行的 Spark 的集群是使用 bdutil。你会运行类似的东西：

./bdutil -P <instance prefix> -p <project id> -b <bucket> -z <zone> -d  \
    -e extensions/spark/spark_on_yarn_env.sh generate_config my_custom_env.sh
./bdutil -e my_custom_env.sh deploy

# Shorthand for logging in to the master
./bdutil -e my_custom_env.sh shell

# Handy way to run a socks proxy to make it easy to access the web UIs
./bdutil -e my_custom_env.sh socksproxy

# When done, delete your cluster
./bdutil -e my_custom_env.sh delete

使用spark_on_yarn_env.sh，Spark 应默认为yarn-client，但您可以随时重新指定--master yarn-client。您可以在bdutil 和./bdutil --help 中看到对可用标志的更详细说明。以下是我上面包含的标志的帮助条目：

-b, --bucket
  Google Cloud Storage bucket used in deployment and by the cluster.

-d, --use_attached_pds
  If true, uses additional non-boot volumes, optionally creating them on
  deploy if they don't exist already and deleting them on cluster delete.

-e, --env_var_files
  Comma-separated list of bash files that are sourced to configure the cluster
  and installed software. Files are sourced in order with later files being
  sourced last. bdutil_env.sh is always sourced first. Flag arguments are
  set after all sourced files, but before the evaluate_late_variable_bindings
  method of bdutil_env.sh. see bdutil_env.sh for more information.

-P, --prefix
  Common prefix for cluster nodes.

-p, --project
  The Google Cloud Platform project to use to create the cluster.

-z, --zone
  Specify the Google Compute Engine zone to use.

【讨论】：

嗨，谢谢你的帮助，我试试你的命令，但如果我尝试启动 spark-submit，它会报告这个 INFO yarn.Client: Application report from ResourceManager: application identifier: application_1434614478260_0003 appId: 3 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: 默认 appMasterRpcPort: -1 appStartTime: 1434617006538 yarnAppState: ACCEPTED DistributedFinalState: UNDEFINED appTrackingUrl: hadoop-m-565h:8088/proxy/application_1434614478260_0003 appUser
如果我尝试使用 bdutil ，在第二步，当我部署 custom_env 时，它会在 2015 年 6 月 18 日星期四 13:00:11 UTC 返回：命令失败：在第 326 行等待 ${SUBPROC}。 2015 年 6 月 18 日星期四 13:00:11 UTC：失败命令的退出代码：1 2015 年 6 月 18 日星期四 13:00:11 UTC：文件中提供了详细的调试信息：/tmp/bdutil-20150618-130008-iVA/debuginfo.tx t
你有/tmp/bdutil-20150618-130008-iVA/debuginfo.txt的内容吗？如果您不想在此处发布，可以将它们发送到 gcp-hadoop-contact@google.com。
每个错误重复 3 次：NAME ZONE MACHINE_TYPE INTERNAL_IP EXTERNAL_IP STATUS ERROR: (gcloud.compute.instances.create) 一些请求没有成功：- Insufficient Permission ERROR: (gcloud.compute.instances.创建）某些请求未成功：2015 年 6 月 18 日星期四 23:13:09 UTC：退出 1：gcloud --project=provadatamining-979 --quiet --verbosity=info 计算实例创建 hadoop-w-1 --machine- type=n1-standard-4 --image=debian-7-backports --network=default --scopes storage-full --boot-disk-type=pd-standard --zone=us-central1-f
啊，您是从 VM 内部运行 bdutil 吗？如果是这样，您需要确保 VM 具有启用了“计算”和“存储完整”范围的“服务帐户”。在后台，bdutil 运行“gcloud compute instances create”，并且还使用“gsutil”来暂存文件。