【发布时间】:2020-02-03 18:07:23
【问题描述】:
在单个 dag 运行期间多久读取一次 dag 定义文件?
有一个大的 dag,需要很长时间才能构建(约 1-3 分钟)。在 dag 运行时查看每个任务的日志,似乎每个任务在运行之前都在执行 dag 定义文件......
*** Reading local file: /home/airflow/airflow/logs/mydag/mytask/2020-01-30T04:51:34.621883+00:00/1.log
[2020-01-29 19:02:10,844] {taskinstance.py:655} INFO - Dependencies all met for <TaskInstance: mydag.mytask2020-01-30T04:51:34.621883+00:00 [queued]>
[2020-01-29 19:02:10,866] {taskinstance.py:655} INFO - Dependencies all met for <TaskInstance: mydag.mytask2020-01-30T04:51:34.621883+00:00 [queued]>
[2020-01-29 19:02:10,866] {taskinstance.py:866} INFO -
--------------------------------------------------------------------------------
[2020-01-29 19:02:10,866] {taskinstance.py:867} INFO - Starting attempt 1 of 1
[2020-01-29 19:02:10,866] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-01-29 19:02:10,883] {taskinstance.py:887} INFO - Executing <Task(BashOperator): precheck_db_perms> on 2020-01-30T04:51:34.621883+00:00
[2020-01-29 19:02:10,887] {standard_task_runner.py:52} INFO - Started process 140570 to run task
[2020-01-29 19:02:11,048] {logging_mixin.py:112} INFO - [2020-01-29 19:02:11,047] {dagbag.py:403} INFO - Filling up the DagBag from /home/airflow/airflow/dags/mydag.py
[2020-01-29 19:02:11,052] {logging_mixin.py:112} INFO - <output from my dag definition file>
[2020-01-29 19:02:11,101] {logging_mixin.py:112} INFO - <more output from my dag definition file>
....
....
....
[2020-01-29 19:02:58,651] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: mydag.mytask 2020-01-30T04:51:34.621883+00:00 [running]> airflowetl.co.local
[2020-01-29 19:02:58,674] {bash_operator.py:81} INFO - Tmp dir root location:
/tmp
[2020-01-29 19:02:58,674] {bash_operator.py:91} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_EMAIL=me@co.org
AIRFLOW_CTX_DAG_OWNER=me
AIRFLOW_CTX_DAG_ID=mydag
AIRFLOW_CTX_TASK_ID=mytask
AIRFLOW_CTX_EXECUTION_DATE=2020-01-30T04:51:34.621883+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2020-01-30T04:51:34.621883+00:00
[2020-01-29 19:02:58,675] {bash_operator.py:105} INFO - Temporary script location: /tmp/airflowtmphwu1ckty/mytaskbmnsizw5
<only now does the actual task logic output seem to start>
日志的第一部分似乎暗示每次运行新任务时都在运行 dag 文件(我在每个任务中都看到了这一点)。
这真的是这里发生的事情吗?这是正常/预期的行为吗?请注意,由于我的 dag 需要一些时间来构建,这意味着时间会在 dag 中的每个任务(在这种情况下有很多)成倍增加,这让我认为这要么不正常,要么有一些我没有在这里使用的最佳实践。有更多气流经验的人可以帮助解释我在这里看到的情况吗?
【问题讨论】:
标签: airflow