【发布时间】:2019-12-13 11:33:37
【问题描述】:
我已使用 BQ 文档中的示例通过此查询将 BQ 表读入 pandas 数据帧:
query_string = """
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())
url view_count
0 https://stackoverflow.com/questions/22879669 48540
1 https://stackoverflow.com/questions/13530967 45778
2 https://stackoverflow.com/questions/35159967 40458
3 https://stackoverflow.com/questions/10604135 39739
4 https://stackoverflow.com/questions/16609219 34479
但是,当我尝试使用任何其他非公开数据集时,我收到以下错误:
google.api_core.exceptions.FailedPrecondition: 400 创建会话时出错:该表的存储格式不受支持
我需要在我的表中设置一些设置,以便它可以与 BQ Storage API 一起使用吗?
这行得通:
query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2"""
>>> bqclient.query(query_string).result().to_dataframe()
funding_round_type f0_
0 venture 104157
1 seed 43747
但是,当我将其设置为使用 bqstorageclient 时,我得到了这个错误:
>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)
Traceback (most recent call last):
File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "there was an error creating the session: the table has a storage format that is not supported"
debug_error_string = "{"created":"@1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>
【问题讨论】:
-
只是一种预感,但也许您遇到了这个限制? “在测试期间,BigQuery Storage API 只能在美国和欧盟的多区域位置访问。” (cloud.google.com/bigquery/docs/reference/storage)
-
@GrahamPolley 不,是美国。我所有的数据集都是美国的
-
@GrahamPolley 老实说,我猜这是基于尝试启动
bqstorage读取会话的权限问题,它给出了错误的错误消息,但只是一个猜测......我想知道是否还有其他人有过这个吗? -
您的数据集肯定是美国多区域,而不是单区域?如果是这样,我很难过。我对存储 API 没有太多经验。它很新。希望 BQ 的一些工程师能加入进来。
标签: google-cloud-platform google-bigquery