我可以在 Oozie 中将 py spark 作为 shell 作业运行吗？答案

【问题标题】：Can I run py spark as a shell job in Oozie?我可以在 Oozie 中将 py spark 作为 shell 作业运行吗？
【发布时间】：2017-07-26 11:13:54
【问题描述】：

我有可以通过 spark-submit 运行的 python 脚本。我需要在 Oozie 中使用它。

<!-- move files from local disk to hdfs -->
<action name="forceLoadFromLocal2hdfs">
<shell xmlns="uri:oozie:shell-action:0.3">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <configuration>
    <property>
      <name>mapred.job.queue.name</name>
      <value>${queueName}</value>
    </property>
  </configuration>
  <exec>driver-script.sh</exec>
<!-- single -->
  <argument>s</argument>
<!-- py script -->
  <argument>load_local_2_hdfs.py</argument>
<!-- local file to be moved-->
  <argument>localPathFile</argument>
<!-- hdfs destination folder, be aware of, script is deleting existing folder! -->
  <argument>hdfFolder</argument>
  <file>${workflowRoot}driver-script.sh#driver-script.sh</file>
  <file>${workflowRoot}load_local_2_hdfs.py#load_local_2_hdfs.py</file>
</shell>
<ok to="end"/>
<error to="killAction"/> 
</action>

脚本本身通过 driver-script.sh 运行良好。通过oozie，即使工作流的状态是SUCCEEDED，文件也不会复制到hdfs。我找不到任何错误日志或 pyspark 作业的相关日志。

我还有另一个话题，来自 oozie here 的 Spark 被抑制的日志

【问题讨论】：

标签： hadoop apache-spark pyspark hdfs oozie

【解决方案1】：

在开头将您的脚本设置为set -x，这将向您显示脚本所在的行。你可以在标准错误中看到那些。

您能否详细说明文件未复制的含义？为您提供更好的帮助。

【讨论】：

你好，我找到了yarn下的日志。文件未从本地复制到 hdfs。这是脚本的工作:)