【发布时间】:2020-07-28 12:50:54
【问题描述】:
我在 azkaban 中运行 python 脚本。
enviroment:
CentOS 8.1
azkaban 3.90.0
Python 3.6.8
ChromeDriver84.0.4147.30
在 test.flow 文件中
nodes:
- name: job_test
type: command
config:
command: python3 /home/azkaban/python_codes/pyib/activity/pickgoods.py
在run执行这个流程大约二十分钟后,系统变得非常慢,执行失败。
28-07-2020 18:30:40 CST job_test INFO - Process with id 1403 completed unsuccessfully in 1727 seconds.
28-07-2020 18:30:40 CST job_test ERROR - Job run failed!
java.lang.RuntimeException: azkaban.jobExecutor.utils.process.ProcessFailureException: Process exited with code 1
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:312)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:830)
at azkaban.execapp.JobRunner.doRun(JobRunner.java:607)
at azkaban.execapp.JobRunner.run(JobRunner.java:568)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: azkaban.jobExecutor.utils.process.ProcessFailureException: Process exited with code 1
at azkaban.jobExecutor.utils.process.AzkabanProcess.run(AzkabanProcess.java:125)
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:304)
... 8 more
28-07-2020 18:30:40 CST job_test ERROR - azkaban.jobExecutor.utils.process.ProcessFailureException: Process exited with code 1 cause: azkaban.jobExecutor.utils.process.ProcessFailureException: Process exited with code 1
28-07-2020 18:30:40 CST job_test INFO - Finishing job job_test at 1595932240480 with status FAILED
和azkaban-web-server下的azkaban-webserver.log
2020/07/28 21:00:34.127 +0800 INFO [ExecutorManager] [AzkabanWebServer-QueueProcessor-Thread] [Azkaban] Successfully refreshed executor: iZbp1hb3esnbp3levrcg05Z:36037 (id: 16), active=true with executor info : ExecutorInfo{remainingMemoryPercent=45.705342424456234, remainingMemoryInMB=835, remainingFlowCapacity=30, numberOfAssignedFlows=0, lastDispatchedTime=1595936723440, cpuUsage=0.01}
2020/07/28 21:00:34.128 +0800 ERROR [ExecutorManager] [AzkabanWebServer-QueueProcessor-Thread] [Azkaban] Failed to update ExecutorInfo for executor : iZbp1hb3esnbp3levrcg05Z:44085 (id: 17), active=true
java.util.concurrent.ExecutionException: org.apache.http.conn.HttpHostConnectException: Connect to iZbp1hb3esnbp3levrcg05Z:44085 [iZbp1hb3esnbp3levrcg05Z/172.16.184.105] failed: Connection refused (Connection refused)
谁能帮忙解决?
【问题讨论】:
-
您的作业进程崩溃。某处可能有一些日志输出显示 Python 回溯等。
-
@AKX ,我会在问题描述中添加错误日志。
-
不,我的意思是查看作业的错误日志:azkaban.readthedocs.io/en/latest/useAzkaban.html#job-logs
-
@AKX,谢谢,根据你的回答,我找到了我的python代码的错误日志。
-
很高兴我能帮上忙。我将其发布为您可以接受的答案。