【发布时间】:2019-08-07 00:18:10
【问题描述】:
我无法在带有 Python 3 的 Airflow 中使用 PubSubHook 发布。在 Python 2 中一切正常,但在 Python 3 中我收到此错误 {models.py:1760} ERROR - Object of type 'bytes' is not JSON serializable。似乎在 Python 3 中对消息进行编码会导致 JSON 序列化程序无法处理的字节。
以下在 Python 2 中可以正常工作:
def send_message_to_pubsub(message):
pubsub_message = {'data': b64encode(message)}
hook = PubSubHook(gcp_conn_id='google_cloud_default')
hook.publish('project-name', 'topic-name', [pubsub_message])
示例here 不适用于 Python 3。
更新 1:
尝试了以下但得到错误:
def send_message_to_pubsub():
message = 'Test message'
pubsub_message = {'data': b64encode(message).decode()}
hook = PubSubHook(gcp_conn_id='google_cloud_default')
hook.publish('project-name', 'topic-name', [pubsub_message])
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test [2019-03-18 17:10:28,903] {models.py:1760} ERROR - a bytes-like object is required, not 'str'
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test Traceback (most recent call last):
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/models.py", line 1659, in _run_raw_task
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test result = task_copy.execute(context=context)
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 95, in execute
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test return_value = self.execute_callable()
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 100, in execute_callable
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test return self.python_callable(*self.op_args, **self.op_kwargs)
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test File "/home/airflow/gcs/dags/pubsub-test-dag.py", line 31, in send_message_to_pubsub
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test pubsub_message = {'data': b64encode(message).decode()}
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/base64.py", line 58, in b64encode
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test encoded = binascii.b2a_base64(s, newline=False)
{base_task_runner.py:101} INFO - Job 1962: Subtask pub_sub_test TypeError: a bytes-like object is required, not 'str'
更新 2:
尝试使用以下方法,导致不同的错误。这次来自 JSON 序列化器:
def send_message_to_pubsub():
message = 'Test message'
pubsub_message = {'data': b64encode(message.encode())}
hook = PubSubHook(gcp_conn_id='google_cloud_default')
hook.publish('project', 'topic', [pubsub_message])
[2019-03-19 10:44:29,845] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test [2019-03-19 10:44:29,841] {models.py:1760} ERROR - Object of type 'bytes' is not JSON serializable
[2019-03-19 10:44:29,846] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test Traceback (most recent call last):
[2019-03-19 10:44:29,846] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/models.py", line 1659, in _run_raw_task
[2019-03-19 10:44:29,847] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test result = task_copy.execute(context=context)
[2019-03-19 10:44:29,847] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 95, in execute
[2019-03-19 10:44:29,847] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test return_value = self.execute_callable()
[2019-03-19 10:44:29,847] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 100, in execute_callable
[2019-03-19 10:44:29,848] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test return self.python_callable(*self.op_args, **self.op_kwargs)
[2019-03-19 10:44:29,848] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/home/airflow/gcs/dags/pubsub-test-dag.py", line 33, in send_message_to_pubsub
[2019-03-19 10:44:29,848] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test hook.publish('project', 'topic', [pubsub_message])
[2019-03-19 10:44:29,848] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_pubsub_hook.py", line 75, in publish
[2019-03-19 10:44:29,849] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test topic=full_topic, body=body)
[2019-03-19 10:44:29,849] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/discovery.py", line 795, in method
[2019-03-19 10:44:29,849] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test actual_path_params, actual_query_params, body_value)
[2019-03-19 10:44:29,850] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/model.py", line 151, in request
[2019-03-19 10:44:29,850] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test body_value = self.serialize(body_value)
[2019-03-19 10:44:29,850] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/site-packages/googleapiclient/model.py", line 260, in serialize
[2019-03-19 10:44:29,850] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test return json.dumps(body_value)
[2019-03-19 10:44:29,851] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/json/__init__.py", line 231, in dumps
[2019-03-19 10:44:29,851] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test return _default_encoder.encode(obj)
[2019-03-19 10:44:29,853] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/json/encoder.py", line 199, in encode
[2019-03-19 10:44:29,853] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test chunks = self.iterencode(o, _one_shot=True)
[2019-03-19 10:44:29,853] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test File "/opt/python3.6/lib/python3.6/json/encoder.py", line 257, in iterencode
[2019-03-19 10:44:29,854] {base_task_runner.py:101} INFO - Job 2172: Subtask pub_sub_test return _iterencode(o, 0)
[2019-03-19 10:44:29,852] {models.py:1791} INFO - Marking task as FAILED.
【问题讨论】:
-
您的代码 sn-p 中的哪一行代码抛出了该错误消息?
-
如上所述,它在 models.py 中出错。似乎错误的原因是字符串在 Python 2 和 Python 3 中的存储方式。气流似乎正在使用 models.py 中的 JSON 序列化程序序列化请求(到 PubSub 的 REST 端点)。但是,JSON 序列化程序仅适用于字符串数据,并且在 Python 3 中对数据进行编码,如上所述,创建消息的字节版本。
-
您使用的是哪个版本的 Airflow?
-
Airflow 版本 1.10.1 通过 GCP Cloud Composer,Python 版本 3.6
-
b64encode(message)的结果是 Python 3 的字节数。将其更改为b64encode(message).decode()。
标签: python-3.x airflow google-cloud-pubsub google-cloud-composer