【发布时间】:2021-02-04 03:37:01
【问题描述】:
如何使用 spark 和 Avatica JDBC Driver 从 Druid 读取数据? This is avatica JDBC document
使用 python 和 Jaydebeapi 模块从 Druid 读取数据,我成功如下代码。
$ python
import jaydebeapi
conn = jaydebeapi.connect("org.apache.calcite.avatica.remote.Driver",
"jdbc:avatica:remote:url=http://0.0.0.0:8082/druid/v2/sql/avatica/",
{"user": "druid", "password":"druid"},
"/root/avatica-1.17.0.jar",
)
cur = conn.cursor()
cur.execute("SELECT * FROM INFORMATION_SCHEMA.TABLES")
cur.fetchall()
输出是:
[('druid', 'druid', 'wikipedia', 'TABLE'),
('druid', 'INFORMATION_SCHEMA', 'COLUMNS', 'SYSTEM_TABLE'),
('druid', 'INFORMATION_SCHEMA', 'SCHEMATA', 'SYSTEM_TABLE'),
('druid', 'INFORMATION_SCHEMA', 'TABLES', 'SYSTEM_TABLE'),
('druid', 'sys', 'segments', 'SYSTEM_TABLE'),
('druid', 'sys', 'server_segments', 'SYSTEM_TABLE'),
('druid', 'sys', 'servers', 'SYSTEM_TABLE'),
('druid', 'sys', 'supervisors', 'SYSTEM_TABLE'),
('druid', 'sys', 'tasks', 'SYSTEM_TABLE')] -> default tables
但我想使用 spark 和 JDBC 阅读。
我试过了,但使用 spark 时出现问题,如下面的代码。
$ pyspark --jars /root/avatica-1.17.0.jar
df = spark.read.format('jdbc') \
.option('url', 'jdbc:avatica:remote:url=http://0.0.0.0:8082/druid/v2/sql/avatica/') \
.option("dbtable", 'INFORMATION_SCHEMA.TABLES') \
.option('user', 'druid') \
.option('password', 'druid') \
.option('driver', 'org.apache.calcite.avatica.remote.Driver') \
.load()
输出是:
Traceback (most recent call last):
File "<stdin>", line 8, in <module>
File "/root/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 172, in load
return self._df(self._jreader.load())
File "/root/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/root/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/root/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o2999.load.
: java.sql.SQLException: While closing connection
...
Caused by: java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "rpcMetadata" (class org.apache.calcite.avatica.remote.Service$CloseConnectionResponse), not marked as ignorable (0 known properties: ])
at [Source: {"response":"closeConnection","rpcMetadata":{"response":"rpcMetadata","serverAddress":"172.18.0.7:8082"}}
; line: 1, column: 46]
...
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "rpcMetadata" (class org.apache.calcite.avatica.remote.Service$CloseConnectionResponse), not marked as ignorable (0 known properties: ])
at [Source: {"response":"closeConnection","rpcMetadata":{"response":"rpcMetadata","serverAddress":"172.18.0.7:8082"}}
; line: 1, column: 46]
...
注意:
- 我从maven-repository下载了Avatica jar文件(
avatica-1.17.0.jar) - 我使用docker-compose 和默认设置值安装了 Druid 服务器。
【问题讨论】:
标签: apache-spark jdbc apache-spark-sql druid apache-calcite