spark + hive - 爱码网

1.如何让 spark-sql 能够访问hive？

只需将hive-site.xml 放到 spark/conf 下即可，hive-site.xml 内容请参照hive集群搭建

2.要在spark 代码中使用sql操作hive，需要在初始化sparksession 时加上

enableHiveSupport()

 val spark = SparkSession
      .builder()
      .appName("df")
      .master("local[*]")
      .enableHiveSupport()
      .getOrCreate()

3.spark开启hive动态分区功能

spark.sql("SET hive.exec.dynamic.partition = true")
spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")

4.spark 查看hive表是否存在

val exists = spark.catalog.tableExists(db, tb)

5.spark 删除hdfs路径（用于重建hive表指定路径）

val hadoopConf = spark.sparkContext.hadoopConfiguration
        val hdfs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
        val path = new Path(location)
        if (hdfs.exists(path)) {
          //为防止误删，禁止递归删除
          hdfs.delete(path, false)
        }