【发布时间】:2020-09-14 15:28:35
【问题描述】:
我们最近开始看到气流网络服务器停止响应。运行systemctl status airflow-webserver 时服务似乎仍在运行,但日志中充满了错误,并且 wb 服务没有响应。中的错误
Sep 14 06:56:45 semaf1-dk1.mid.dom airflow[1833]: [2020-09-14 06:56:45,662] {{cli.py:990}} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
Sep 14 06:56:56 semaf1-dk1.mid.dom airflow[1833]: [2020-09-14 06:56:56,701] {{cli.py:990}} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
Sep 14 06:57:07 semaf1-dk1.mid.dom airflow[1833]: [2020-09-14 06:57:07,738] {{cli.py:990}} ERROR - [0 / 0] Some workers seem to have died and gunicorn did not restart them as expected
在这些开始发生之前,我已经在日志中倒退到最后一行 (journalctl -u airflow-webserver -a | grep -v "Some workers seem to have died and gunicorn did not restart them as expected" | tail -n 100),我发现的最后一个错误如下所示:
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,503] {{base.py:601}} ERROR - Add Permission on Menu Error: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]Communication link failure (0) (SQLExecDirectW)')
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [SQL: SELECT TOP 1 ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: FROM ab_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: WHERE ab_view_menu.name = ?]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [parameters: ('Logs',)]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: (Background on this error at: http://sqlalche.me/e/13/e3q8)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,504] {{base.py:600}} ERROR - (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [SQL: SELECT TOP 1 ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: FROM ab_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: WHERE ab_view_menu.name = ?]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [parameters: [immutabledict({})]]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: Traceback (most recent call last):
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1203, in _execute_context
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: conn = self._revalidate_connection()
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 473, in _revalidate_connection
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: "Can't reconnect until invalid "
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: The above exception was the direct cause of the following exception:
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: Traceback (most recent call last):
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/flask_appbuilder/base.py", line 598, in _add_permissions_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: self.sm.add_permissions_menu(name)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/flask_appbuilder/security/manager.py", line 1211, in add_permissions_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: self.add_view_menu(view_menu_name)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/flask_appbuilder/security/sqla/manager.py", line 431, in add_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: view_menu = self.find_view_menu(name)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/flask_appbuilder/security/sqla/manager.py", line 420, in find_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: return self.get_session.query(self.viewmenu_model).filter_by(name=name).first()
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3397, in first
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: ret = list(self[0:1])
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3171, in __getitem__
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: return list(res)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: return self._execute_and_instances(context)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: result = conn.execute(querycontext.statement, self._params)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: return meth(self, multiparams, params)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: return connection._execute_clauseelement(self, multiparams, params)
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: distilled_params,
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1208, in _execute_context
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: e, util.text_type(statement), parameters, None, None
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1512, in _handle_dbapi_exception
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: sqlalchemy_exception, with_traceback=exc_info[2], from_=e
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: raise exception
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1203, in _execute_context
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: conn = self._revalidate_connection()
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: File "/usr/local/airflow/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 473, in _revalidate_connection
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: "Can't reconnect until invalid "
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [SQL: SELECT TOP 1 ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: FROM ab_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: WHERE ab_view_menu.name = ?]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [parameters: [immutabledict({})]]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,505] {{base.py:601}} ERROR - Add Permission on Menu Error: (sqlalchemy.exc.InvalidRequestError) Can't reconnect until invalid transaction is rolled back
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [SQL: SELECT TOP 1 ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: FROM ab_view_menu
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: WHERE ab_view_menu.name = ?]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [parameters: [immutabledict({})]]
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,508] {{base.py:414}} INFO - Registering class SlaMissModelView on menu SLA Misses
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,508] {{baseviews.py:266}} INFO - Registering route /slamiss/action/<string:name>/<pk> ['GET', 'POST']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,508] {{baseviews.py:266}} INFO - Registering route /slamiss/action_post ['POST']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,510] {{baseviews.py:266}} INFO - Registering route /slamiss/add ['GET', 'POST']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,510] {{baseviews.py:266}} INFO - Registering route /slamiss/api ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,510] {{baseviews.py:266}} INFO - Registering route /slamiss/api/column/add/<col_name> ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,510] {{baseviews.py:266}} INFO - Registering route /slamiss/api/column/edit/<col_name> ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,511] {{baseviews.py:266}} INFO - Registering route /slamiss/api/create ['POST']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,511] {{baseviews.py:266}} INFO - Registering route /slamiss/api/delete/<pk> ['DELETE']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,511] {{baseviews.py:266}} INFO - Registering route /slamiss/api/get/<pk> ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,511] {{baseviews.py:266}} INFO - Registering route /slamiss/api/read ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,511] {{baseviews.py:266}} INFO - Registering route /slamiss/api/readvalues ['GET']
Sep 13 05:12:17 semaf1-dk1.mid.dom airflow[1833]: [2020-09-13 05:12:17,512] {{baseviews.py:266}} INFO - Registering route /slamiss/api/update/<pk> ['PUT']
似乎只有网络服务器停止响应。调度程序正在运行,作业照常执行。
这可能与数据库维护有关(它通常发生在允许数据库维护的周末),但我希望一旦数据库再次运行,Airflow 会从中恢复。我们正在运行 Airflow 1.10.11、用于 Airflow 数据库的 SQL Server 和 Red Hat Enterprise Server。
运行 systemctl restart airflow-webserver 总是可以解决问题。
有没有其他人观察到类似的问题或想法来解决这个问题?
【问题讨论】:
-
在airflow docker image 1.10.14中也注意到了
标签: airflow