如何在 colab 上设置 Google bigquery 环境变量答案

【问题标题】：How to set Google bigquery environment variable on colab如何在 colab 上设置 Google bigquery 环境变量
【发布时间】：2022-01-19 22:10:03
【问题描述】：

我打算创建一个脚本来从 Bigquery 中提取数据，但我不知道如何设置环境变量。

这是官方文档中的一个实例：

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

我运行它但返回错误：

DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

我关注official doc，但遇到一个问题：第二步是设置环境变量，但它只提供 Windows 和 Linux/macOS 上的实例。那么，如何在 Colab 上设置环境变量呢？

另外，我注意到实例要求我提供关键路径。在本地机器上没问题，但我认为上传我的密钥文件并在我的代码中通过它的链接在线是一个想法。

【问题讨论】：

标签： python google-bigquery environment-variables google-colaboratory

【解决方案1】：

无需设置环境变量或直接上传到 Colab，您可以将密钥上传到您的 Google 云端硬盘并在那里应用必要的限制。然后在您的代码中，您可以将 Google Drive 挂载到 Colab，使用 Drive 位置作为密钥文件路径进行身份验证。

from google.cloud import bigquery
from google.oauth2 import service_account
from google.colab import drive
import json
# Construct a BigQuery client object.

drive.mount('/content/drive/') # Mount to google drive

# Define full path from Google Drive.
# This example, key is in /MyDrive/Auth/
key_path = '/content/drive/MyDrive/Auth/your_key.json' 

credentials = service_account.Credentials.from_service_account_file(
    filename=key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)

client = bigquery.Client(credentials=credentials, project=credentials.project_id,)

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

输出：

【讨论】：

【解决方案2】：

我提出了这个问题，我认为 Ricco D 的解决方案可以完美地解决我的问题。

但是，我查看了 Google 官方文档，发现它提供了几种从 BigQuery 中提取数据的方法：

通过魔法使用 BigQuery (%%bigquery --project yourprojectid)
通过 google-cloud-bigquery 使用 BigQuery（使用 client.query()）
通过 pandas-gbq 使用 BigQuery（使用 pd.io.gbq.read_gbq()）

实例和参数设置见here

【讨论】：