【问题标题】:How to query data stored in Hive table using SparkSession of Spark2?如何使用 Spark2 的 SparkSession 查询 Hive 表中存储的数据?
【发布时间】:2017-01-05 04:46:34
【问题描述】:

我正在尝试从 Spark2 查询存储在 Hive 表中的数据。环境:1.cloudera-quickstart-vm-5.7.0-0-vmware 2.带有Scala2.11.8插件的Eclipse 3.Spark2和Maven下

我没有更改 spark 默认配置。我需要在 Spark 或 Hive 中配置什么吗?

代码

import org.apache.spark._
import org.apache.spark.sql.SparkSession
object hiveTest {
 def main (args: Array[String]){
   val sparkSession = SparkSession.builder.
      master("local")
      .appName("HiveSQL")
      .enableHiveSupport()
      .getOrCreate()

  val data=  sparkSession2.sql("select * from test.mark")
}
}

遇到错误

16/08/29 00:18:10 INFO SparkSqlParser: Parsing command: select * from test.mark
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:48)
    at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:47)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:54)
    at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:54)
    at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
    at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
    at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
    at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
    at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
    at hiveTest$.main(hiveTest.scala:34)
    at hiveTest.main(hiveTest.scala)
Caused by: java.lang.IllegalArgumentException: requirement failed: Duplicate SQLConfigEntry. spark.sql.hive.convertCTAS has been registered
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.sql.internal.SQLConf$.org$apache$spark$sql$internal$SQLConf$$register(SQLConf.scala:44)
    at org.apache.spark.sql.internal.SQLConf$SQLConfigBuilder$$anonfun$apply$1.apply(SQLConf.scala:51)
    at org.apache.spark.sql.internal.SQLConf$SQLConfigBuilder$$anonfun$apply$1.apply(SQLConf.scala:51)
    at org.apache.spark.internal.config.TypedConfigBuilder$$anonfun$createWithDefault$1.apply(ConfigBuilder.scala:122)
    at org.apache.spark.internal.config.TypedConfigBuilder$$anonfun$createWithDefault$1.apply(ConfigBuilder.scala:122)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.internal.config.TypedConfigBuilder.createWithDefault(ConfigBuilder.scala:122)
    at org.apache.spark.sql.hive.HiveUtils$.<init>(HiveUtils.scala:103)
    at org.apache.spark.sql.hive.HiveUtils$.<clinit>(HiveUtils.scala)
    ... 14 more

欢迎提出任何建议

谢谢
罗宾

【问题讨论】:

  • 我很惊讶代码完全编译,因为您在使用sparkSession2 时创建了sparkSession。我怀疑代码是您执行的代码。你能解释一下区别吗?你如何执行你的应用程序?你的 pom.xml 是什么(依赖)?

标签: scala maven hive apache-spark-sql apache-spark-2.0


【解决方案1】:

这是我正在使用的:

import org.apache.spark.sql.SparkSession
object LoadCortexDataLake extends App {
 val spark = SparkSession.builder().appName("Cortex-Batch").enableHiveSupport().getOrCreate()
spark.read.parquet(file).createOrReplaceTempView("temp")
       spark.sql(s"insert overwrite table $table_nm partition(year='$yr',month='$mth',day='$dt') select * from temp")

我认为你应该使用“sparkSession.sql”而不是“sparkSession2.sql”

【讨论】:

    【解决方案2】:
    import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
    
    val spark = SparkSession.
      builder().
      appName("Connect to Hive").
      config("hive.metastore.warehouse.uris","thrift://cdh-hadoop-master:Port").
      enableHiveSupport().
      getOrCreate()
    
      val df = spark.sql("SELECT * FROM  table_name")
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-05-02
      • 2013-11-26
      相关资源
      最近更新 更多