【问题标题】:Can't connect dbt to Databricks无法将 dbt 连接到 Databricks
【发布时间】:2022-03-23 22:59:54
【问题描述】:

我正在尝试连接到 Databricks 上的 Spark 集群,我正在学习本教程:https://docs.databricks.com/dev-tools/dbt.html。我安装了dbt-databricks 连接器(https://github.com/databricks/dbt-databricks)。但是,无论我如何配置它,当我运行 dbt test / dbt debug 时,我总是收到“数据库错误,无法连接”。

这是我的profiles.yaml

databricks_cluster:
  outputs:
    dev:
      connect_retries: 5
      connect_timeout: 60
      host: <my_server_hostname>
      http_path: <my_http_path>
      schema: default
      token: <my_token>
      type: databricks
  target: dev

这是我的dbt_project.yml

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'dbt_dem'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'databricks_cluster'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
  dbt_dem:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view

我也尝试过使用spark 连接器,但使用它仍然会出现同样的错误。关于为什么我无法连接到 Databricks 集群的任何想法?

这些是错误对应的日志:

============================== 2022-02-18 08:43:22.123066 | 4b91f9d3-28ad-4f5a-93db-f431b6d9af14 ==============================
08:43:22.123066 [info ] [MainThread]: Running with dbt=1.0.1
08:43:22.123841 [debug] [MainThread]: running dbt with arguments Namespace(cls=<class 'dbt.task.debug.DebugTask'>, config_dir=False, debug=None, defer=None, event_buffer_size=None, fail_fast=None, log_cache_events=False, log_format=None, partial_parse=None, printer_width=None, profile=None, profiles_dir='/Users/keremaslan/.dbt', project_dir=None, record_timing_info=None, rpc_method=None, send_anonymous_usage_stats=None, single_threaded=False, state=None, static_parser=None, target=None, use_colors=None, use_experimental_parser=None, vars='{}', version_check=None, warn_error=None, which='debug', write_json=None)
08:43:22.124057 [debug] [MainThread]: Tracking: tracking
08:43:22.143750 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751ef42e0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751ef4eb0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751eca730>]}
08:43:22.236001 [debug] [MainThread]: Executing "git --help"
08:43:22.264682 [debug] [MainThread]: STDOUT: "b"usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]\n           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]\n           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]\n           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]\n           <command> [<args>]\n\nThese are common Git commands used in various situations:\n\nstart a working area (see also: git help tutorial)\n   clone             Clone a repository into a new directory\n   init              Create an empty Git repository or reinitialize an existing one\n\nwork on the current change (see also: git help everyday)\n   add               Add file contents to the index\n   mv                Move or rename a file, a directory, or a symlink\n   restore           Restore working tree files\n   rm                Remove files from the working tree and from the index\n   sparse-checkout   Initialize and modify the sparse-checkout\n\nexamine the history and state (see also: git help revisions)\n   bisect            Use binary search to find the commit that introduced a bug\n   diff              Show changes between commits, commit and working tree, etc\n   grep              Print lines matching a pattern\n   log               Show commit logs\n   show              Show various types of objects\n   status            Show the working tree status\n\ngrow, mark and tweak your common history\n   branch            List, create, or delete branches\n   commit            Record changes to the repository\n   merge             Join two or more development histories together\n   rebase            Reapply commits on top of another base tip\n   reset             Reset current HEAD to the specified state\n   switch            Switch branches\n   tag               Create, list, delete or verify a tag object signed with GPG\n\ncollaborate (see also: git help workflows)\n   fetch             Download objects and refs from another repository\n   pull              Fetch from and integrate with another repository or a local branch\n   push              Update remote refs along with associated objects\n\n'git help -a' and 'git help -g' list available subcommands and some\nconcept guides. See 'git help <command>' or 'git help <concept>'\nto read about a specific subcommand or concept.\nSee 'git help git' for an overview of the system.\n""
08:43:22.265387 [debug] [MainThread]: STDERR: "b''"
08:43:22.272505 [debug] [MainThread]: Acquiring new databricks connection "debug"
08:43:22.273434 [debug] [MainThread]: Using databricks connection "debug"
08:43:22.273833 [debug] [MainThread]: On debug: select 1 as id
08:43:22.274044 [debug] [MainThread]: Opening a new connection, currently in state init
08:43:22.888586 [debug] [MainThread]: Databricks adapter: Error while running:
select 1 as id
08:43:22.889031 [debug] [MainThread]: Databricks adapter: Database Error
  failed to connect
08:43:22.889905 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb751f7eaf0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb752113040>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fb7521130a0>]}
08:43:24.130154 [debug] [MainThread]: Connection 'debug' was properly closed.

【问题讨论】:

  • 你只有这个错误,还是更详细的错误?也许您可以启用dbt 并提供更详细的错误消息?
  • 我找不到--verbose 命令,但我设法找到了相应的日志。我不确定这里是否有任何有用的信息。
  • 是的,不是很有帮助...
  • @k88 是 http 类型连接的硬性要求还是您愿意使用 ODBC 连接类型?
  • 另外 - 不确定它是否会有所不同,但您可能需要将文件名从 profiles.yaml 更改为 profiles.yml

标签: databricks dbt


【解决方案1】:

检查您的profiles.yml(通常在这里找到:~/.dbt/profiles.yml)并确保有:

  1. 主机名中没有https://
  2. http_path 中的前导斜杠

例如:

host: server.name.com                              # instead of https://server.name.com
http_path: /sql/protocolv1/o/0/0000-000000-text000 # note the leading slash

如果您像我一样直接从集群配置页面复制了 http_path 并直接从浏览器 URL 复制了主机,那么这些很容易犯错误。

dbt-databricks 自述文件中的profiles.yml 的另一个示例:https://github.com/databricks/dbt-databricks

【讨论】:

    【解决方案2】:

    我没有在原始问题中指定这一点,但我使用conda 设置了虚拟环境。不知何故,这不起作用,所以我建议您按照教程进行操作并使用pipenv

    【讨论】:

    • pyenv 为我工作,没有做任何特别的事情——为了它的价值。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-04
    • 2021-07-11
    • 2022-10-24
    • 2021-11-06
    相关资源
    最近更新 更多