【发布时间】:2020-11-25 08:04:03
【问题描述】:
我在 apache 气流 GUI 中有以下日志:
*** Reading local file: /home/ubuntu/airflow/logs/risk_position/delta_risk/2020-11-25T06:38:38.444673+00:00/1.log
[2020-11-25 06:52:40,950] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: risk_position.delta_risk 2020-11-25T06:38:38.444673+00:00 [queued]>
[2020-11-25 06:52:40,964] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: risk_position.delta_risk 2020-11-25T06:38:38.444673+00:00 [queued]>
[2020-11-25 06:52:40,965] {taskinstance.py:880} INFO -
--------------------------------------------------------------------------------
[2020-11-25 06:52:40,965] {taskinstance.py:881} INFO - Starting attempt 1 of 6
[2020-11-25 06:52:40,965] {taskinstance.py:882} INFO -
--------------------------------------------------------------------------------
[2020-11-25 06:52:40,974] {taskinstance.py:901} INFO - Executing <Task(BashOperator): delta_risk> on 2020-11-25T06:38:38.444673+00:00
[2020-11-25 06:52:40,977] {standard_task_runner.py:54} INFO - Started process 18650 to run task
[2020-11-25 06:52:41,002] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'risk_position', 'delta_risk', '2020-11-25T06:38:38.444673+00:00', '--job_id', '64', '--pool', 'default_pool', '--raw', '-sd', '/home/ubuntu/.local/lib/python3.6/site-packages/airflow/example_dags/risk_position.py', '--cfg_path', '/tmp/tmp1kqxd_yj']
[2020-11-25 06:52:41,003] {standard_task_runner.py:78} INFO - Job 64: Subtask delta_risk
[2020-11-25 06:52:41,024] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: risk_position.delta_risk 2020-11-25T06:38:38.444673+00:00 [running]> ip-************.ap-northeast-1.compute.internal
[2020-11-25 06:52:41,035] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2020-11-25 06:52:41,036] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmpowss08ak/delta_riskhmnyrm0e
[2020-11-25 06:52:41,036] {bash_operator.py:146} INFO - Running command: /home/ubuntu/extra/cronjobs/unify_report.sh delta_risk || exit 1
[2020-11-25 06:52:41,042] {bash_operator.py:153} INFO - Output:
[2020-11-25 06:52:41,044] {bash_operator.py:157} INFO - delta_risk report
[2020-11-25 06:52:41,046] {bash_operator.py:157} INFO - output /home/ubuntu/market_risk/delta_risk/output 2020-11-24delta_risk.xlsx
[2020-11-25 06:52:41,046] {bash_operator.py:157} INFO - exists /home/ubuntu/extra/cronjobs/unify_reports
[2020-11-25 06:52:41,971] {bash_operator.py:161} INFO - Command exited with return code 0
[2020-11-25 06:52:41,976] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=risk_position, task_id=delta_risk, execution_date=20201125T063838, start_date=20201125T065240, end_date=20201125T065241
[2020-11-25 06:52:45,928] {local_task_job.py:102} INFO - Task exited with return code 0
正如本例所示,bash Running command: /home/ubuntu/extra/cronjobs/unify_report.sh delta_risk || exit 1 中有一个命令运行 delta_risk.py 文件。此相同的 .py 文件会生成回溯错误并且无法正确完成。我已经输入了|| exit 1,希望这也会从 bash 脚本中发送一个错误,这将使该过程在 AIrflow GUI 中失败。但是这个过程在 Airflow 中取得了成功。当出现回溯错误时,我希望它与 python 文件一起失败。
请问怎么可能?
编辑:
下面是运行python文件并创建日志文件的.sh部分:
(
# run each code to generate files
cd "${SOURCE_PATH_base}/${SUB_PATH}"
python3 $REPORT_TYPE.py || exit 1
# output file link
dir=$top_dir/output_$REPORT_TYPE
echo $dir 'output link'
if [ -e $dir ]; then
echo 'exists' $dir
else
# create link to real path
echo 'no output link' $dir
echo 'real output path' ${OUTPATH}
cd ${TOPDIR}/${JOBDIR}
trap `ln -s "${OUTPATH}" "output_${REPORT_TYPE}"` 1 2 3 15
echo 'created output link' $dir
fi
# upload to gdrive
cd "${SOURCE_PATH_base}/${INTERAPI_PATH}"
python3 teamdrive_control.py Market_Risk $REPORT_TYPE "${OUTPATH}/${OUT}" "${TO}" "${FILENO_FLAG}" "${SUBJECT}"
) 2>&1 | xz -9ec > logs/${JOBDIR}-$(date +%s)_$REPORT_TYPE.log.xz
【问题讨论】:
-
可能是脚本掩盖了错误。我们看不到它的来源,所以我们不能告诉你如何修复它。一个正确编写的 shell 脚本会将失败传播给它的调用者(一个编写不当的脚本可能只是在结尾处简单地
exit 0而不管之前发生了什么)。 -
@tripleee 感谢您的评论。您是否认为像我在 sh 文件中所做的那样创建日志可以像您提到的那样“屏蔽”?
-
是的,管道的退出状态将是管道中最后一个命令的退出状态。