【问题标题】:BigQuery Storage API: the table has a storage format that is not supportedBigQuery Storage API:表的存储格式不受支持
【发布时间】:2019-12-13 11:33:37
【问题描述】:

我已使用 BQ 文档中的示例通过此查询将 BQ 表读入 pandas 数据帧:

query_string = """
SELECT
CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())

                                            url  view_count
0  https://stackoverflow.com/questions/22879669       48540
1  https://stackoverflow.com/questions/13530967       45778
2  https://stackoverflow.com/questions/35159967       40458
3  https://stackoverflow.com/questions/10604135       39739
4  https://stackoverflow.com/questions/16609219       34479

但是,当我尝试使用任何其他非公开数据集时,我收到以下错误:

google.api_core.exceptions.FailedPrecondition: 400 创建会话时出错:该表的存储格式不受支持

我需要在我的表中设置一些设置,以便它可以与 BQ Storage API 一起使用吗?

这行得通:

query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2""" 
>>> bqclient.query(query_string).result().to_dataframe()

funding_round_type     f0_
0            venture  104157
1               seed   43747

但是,当我将其设置为使用 bqstorageclient 时,我得到了这个错误:

>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)

Traceback (most recent call last):
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.FAILED_PRECONDITION
    details = "there was an error creating the session: the table has a storage format that is not supported"
    debug_error_string = "{"created":"@1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>

【问题讨论】:

  • 只是一种预感,但也许您遇到了这个限制? “在测试期间,BigQuery Storage API 只能在美国和欧盟的多区域位置访问。” (cloud.google.com/bigquery/docs/reference/storage)
  • @GrahamPolley 不,是美国。我所有的数据集都是美国的
  • @GrahamPolley 老实说,我猜这是基于尝试启动bqstorage 读取会话的权限问题,它给出了错误的错误消息,但只是一个猜测......我想知道是否还有其他人有过这个吗?
  • 您的数据集肯定是美国多区域,而不是单区域?如果是这样,我很难过。我对存储 API 没有太多经验。它很新。希望 BQ 的一些工程师能加入进来。

标签: google-cloud-platform google-bigquery


【解决方案1】:

我在 2019 年 11 月 6 日遇到了同样的问题,事实证明,您遇到的错误是 Read API 的一个已知问题,因为它目前无法处理小于 10MB 的结果集。我遇到了这个问题,这对这个问题有一些启发: GitHub.com - GoogleCloudPlatform/spark-bigquery-connector - FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported #46

我已经使用返回大于 10MB 的结果集的查询对其进行了测试,对于我正在查询的数据集的欧盟多区域位置来说,它似乎可以正常工作。

此外,您需要在您的环境中安装 fastavro 才能使此功能正常工作。

【讨论】:

    猜你喜欢
    • 2018-12-11
    • 2019-11-12
    • 1970-01-01
    • 2021-11-16
    • 1970-01-01
    • 2016-07-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多