【发布时间】:2019-12-31 19:25:46
【问题描述】:
我正在从 PySpark 运行一个简单的 Hive 查询,但它会引发错误。该表为 ORC 格式。需要一些帮助。下面是代码
spark = SparkSession.builder.appName("Termination_Calls Snapshot").config("hive.exec.dynamic.partition", "true").config("hive.exec.dynamic.partition.mode", "nonstrict").enableHiveSupport().getOrCreate()
x_df = spark.sql("SELECT count(*) as RC from bi_schema.table_a")
这会引发如下错误
Hive Session ID = a00fe842-7099-4130-ada2-ee4ae75764be Traceback (mostrecent call last): File "<stdin>", line 1, in <module> File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 716, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",line 1257, in __call__ File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o70.sql. : java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:214)
当我在 hive 中运行相同的查询时,我得到了预期的结果,如下所示。
+-------------+
| rc |
+-------------+
| 3037579538 |
+-------------+
1 row selected (25.469 seconds)
【问题讨论】:
标签: apache-spark hive pyspark pyspark-sql orc