【发布时间】:2021-11-24 19:06:47
【问题描述】:
我正在尝试运行一个简单的 Dataflow 管道。在最终消除了一些与服务帐户相关的权限错误之后,我的管道现在已进入下一阶段的故障。然而,这一次,我更不清楚我应该如何读取/调试输出日志:
在本地运行脚本,这是我的输出:
ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0
WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n out = subprocess.check_output(*args, **kwargs)\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 424, in check_output\n return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n File "/usr/local/Caskroom/miniconda/base/envs/myenv4/lib/python3.9/subprocess.py", line 528, in run\n raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/Caskroom/miniconda/base/envs/myenv4/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/var/folders/_3/tk69j41x2t9cvh0dbvzdmm2m0000gn/T/tmpr51_u_l1\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0 \n Output from execution of subprocess: b\'\'')
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.
在 GKE 上,这是我的输出:
[server]Traceback (most recent call last):
[server] File "/app/shared/to_db.py", line 101, in <module>
[server] beamer()
[server] File "/app/shared/to_db.py", line 91, in beamer
[server] quotes | beam.io.WriteToBigQuery(
[server] File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 597, in __exit__
[server] self.result.wait_until_finish()
[server] File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1640, in wait_until_finish
[server] raise DataflowRuntimeException(
[server]apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
[server]Workflow failed.
Streaming logs from pod: python-property-tax-84ff6c46f6-qxh5h container: server
[server]/usr/local/lib/python3.9/site-packages/apache_beam/__init__.py:79: UserWarning: This version of Apache Beam has not been sufficiently tested on Python 3.9. You may encounter bugs or missing features.
[server] warnings.warn(
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2103: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server] is_streaming_pipeline = p.options.view_as(StandardOptions).streaming
[server]/usr/local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1112: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
[server] temp_location = p.options.view_as(GoogleCloudOptions).temp_location
[server]ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
[server]ERROR: No matching distribution found for apache-beam==2.34.0
[server]WARNING:apache_beam.runners.portability.stager:Failed to download requested binary distribution of the SDK: RuntimeError('Full traceback: Traceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/apache_beam/utils/processes.py", line 89, in check_output\n out = subprocess.check_output(*args, **kwargs)\n File "/usr/local/lib/python3.9/subprocess.py", line 424, in check_output\n return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n File "/usr/local/lib/python3.9/subprocess.py", line 528, in run\n raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'/usr/local/bin/python\', \'-m\', \'pip\', \'download\', \'--dest\', \'/tmp/tmpyx3iprn_\', \'apache-beam==2.34.0\', \'--no-deps\', \'--only-binary\', \':all:\', \'--python-version\', \'39\', \'--implementation\', \'cp\', \'--abi\', \'cp39\', \'--platform\', \'manylinux1_x86_64\']\' returned non-zero exit status 1.\n \n Pip install failed for package: apache-beam==2.34.0 \n Output from execution of subprocess: b\'\'')
[server]WARNING:root:Make sure that locally built Python SDK docker image has Python 3.9 interpreter.
如您所见,我尝试在本地和 GKE 上运行:类似的错误。我在我的 dockerfile 中没有发现任何问题:
# Python image to use.
FROM python:3.9
# Set the working directory to /app
WORKDIR /app
# copy the requirements file used for dependencies
# COPY requirements.txt .
RUN apt-get update
RUN apt-get install -y gdal-bin
# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip
# If I don't redundantly install here, python gives me a "apache-beam: import not found" error
RUN pip install apache-beam
RUN pip install "apache-beam[gcp]"
RUN pip install poetry
# Copy the rest of the working directory contents into the container at /app
COPY . .
RUN poetry install
# Run app.py when the container launches
ENTRYPOINT ["python", "shared/to_db.py"]
在pyproject.toml:
[tool.poetry.dependencies]
# ...
python = ">=3.9,<3.11"
google-cloud-bigquery = "^2.30.1"
BigQuery-Python = "^1.15.0"
apache-beam = {extras = ["gcp"], version = "^2.34.0"}
wheel = "^0.37.0"
Google 自己的文档称 Dataflow 支持梁 v2.34.0。那我为什么会得到:
ERROR: Could not find a version that satisfies the requirement apache-beam==2.34.0 (from versions: none)
ERROR: No matching distribution found for apache-beam==2.34.0
【问题讨论】:
标签: google-cloud-dataflow apache-beam