如何spark-submit IDE编写的hiveContext？答案

【问题标题】：How to spark-submit hiveContext which is written by IDE？如何spark-submit IDE编写的hiveContext？
【发布时间】：2017-08-14 09:57:41
【问题描述】：

我正在尝试在 Spark 集群上部署包含 hiveContext 的代码。

./spark-submit --class com.dt.sparkSQL.DataFrameToHive --master spark://SparkMaster:7077 /root/Documents/DataFrameToHive.jar 但是问题来了

17/08/13 10:29:46 INFO hive.metastore: Trying to connect to metastore with URI thrift://SparkMaster:9083
17/08/13 10:29:46 WARN hive.metastore: Failed to connect to the MetaStore Server...
17/08/13 10:29:46 INFO hive.metastore: Waiting 1 seconds before next connection attempt.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

当我做 spark-shell 时

./spark-shell  --master spark://SparkMaster:7077

我可以成功连接 SparkMaster:9083。这是我的 spark/conf/hive-site.xml

<configuration>
<property>
        <name>hive.metastore.uris</name>
        <value>thrift://SparkMaster:9083</value>
        <description>thrift URI for the remote metastore.Used by metastore client to connect to remote metastore. </description>
</property>
</configuration>

我的问题是为什么当我执行 spark-submit 时它会与 SparkMaster:9083 连接，SparkMaster:9083 有什么问题？这是IDE上的代码

package com.dt.sparkSQL

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
object DataFrameToHive {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
    conf.setAppName("DataFrameToHive").setMaster("spark://SparkMaster:7077")
    val sc = new SparkContext(conf)
    val hiveContext = new HiveContext(sc)
    hiveContext.sql("use userdb")
    hiveContext.sql("DROP TABLE IF EXISTS people")
    hiveContext.sql("CREATE TABLE IF NOT EXISTS people(name STRING, age INT)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'")
    hiveContext.sql("LOAD DATA LOCAL INPATH '/root/Documents/people.txt' INTO TABLE people")
    hiveContext.sql("use userdb")
    hiveContext.sql("DROP TABLE IF EXISTS peopleScores")
    hiveContext.sql("CREATE TABLE IF NOT EXISTS peopleScores(name STRING, score INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n'")
    hiveContext.sql("LOAD DATA LOCAL INPATH '/root/Documents/peopleScore.txt' INTO TABLE peopleScores")
    val resultDF = hiveContext.sql("select pi.name,pi.age,ps.score "
      +" from people pi join peopleScores ps on pi.name=ps.name"
      +" where ps.score>90")
    hiveContext.sql("drop table if exists peopleResult")
    resultDF.saveAsTable("peopleResult")
    val dataframeHive = hiveContext.table("peopleResult")
    dataframeHive.show()
  }
}
`

【问题讨论】：

标签： apache-spark hive

【解决方案1】：

我已经成功解决了这个问题。部署 hiveContext 与普通的 jars 有点不同。

./spark-submit  --class com.dt.sparkSQL.DataFrameToHive --files /usr/local/hive/apache-hive-1.2.1-bin/conf/hive-site.xml   --master spark://SparkMaster:7077  /root/Documents/DataFrameToHive.jar

【讨论】：