如何在 spark scala 数据框视图上应用过滤器？答案

【问题标题】：How to apply filters on spark scala dataframe view?如何在 spark scala 数据框视图上应用过滤器？
【发布时间】：2022-12-03 05:38:51
【问题描述】：

我在这里粘贴了一个 sn-p，我遇到了 BigQuery Read 的问题。 “wherePart”有更多的记录，因此 BQ 调用被一次又一次地调用。将过滤器保持在 BQ Read 之外会有所帮助。这个想法是，首先从 BQ 读取“mainTable”，将其存储在 spark 视图中，然后将“wherePart”过滤器应用于 spark 中的该视图。 [“subDate”是一种从一个日期减去另一个日期并返回两者之间的天数的函数]

  val Df =  getFb(config, mainTable, ds)

  def getFb(config: DataFrame, mainTable: String, ds: String) : DataFrame = {

    val fb = config.map(row => Target.Pfb(
      row.getAs[String]("m1"),
      row.getAs[String]("m2"),
      row.getAs[Seq[Int]]("days")))
      .collect

    val wherePart = fb.map(x => (x.m1, x.m2, subDate(ds, x.days.max - 1))).
      map(x => s"(idata_${x._1} = '${x._2}' AND ds BETWEEN '${x._3}' AND '${ds}')").
      mkString(" OR ")

    val q = new Q()
    val tempView = "tempView"
    spark.readBigQueryTable(mainTable, wherePart).createOrReplaceTempView(tempView)
    val Df = q.mainTableLogs(tempView)
    Df
  }

有人可以在这里帮助我吗？

【问题讨论】：

标签： scala apache-spark google-bigquery apache-spark-sql view

【解决方案1】：

您使用的是spark-bigquery-connector吗？如果是这样，正确的语法是

spark.read.format("bigquery")
  .load(mainTable)
  .where(wherePart)
  .createOrReplaceTempView(tempView)

【讨论】：