在scala中访问在匹配外部声明的变量答案

【问题标题】：Acessing Variable Declared inside match outside in scala在scala中访问在匹配外部声明的变量
【发布时间】：2020-05-06 16:25:42
【问题描述】：

我正在使用 try 和 catch 方法在使用以下代码读取 Dataframe 中的文件时捕获异常

import scala.io.StdIn
import scala.util.{Try, Success, Failure}
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val filename = "s3a://bucketname/moving/file.csv"

def CustomSchemaDataFrame(fileName: String):Try[DataFrame] = {

try {
     val df_custom_schema = spark.read.format("csv").option("header", "true").load(filename) 
      Success(df_custom_schema)
    } catch {
      case unknown: Exception => {
        println(s"Unknown exception: $unknown")
        Failure(unknown)
      }
    }
  }

CustomSchemaDataFrame(filename) match {
  case Success(df_custom_schema) => {
      println("File Read Successfully")
      df_custom_schema.printSchema()
      df_custom_schema.show(true)
  }
  case Failure(ex) => {
      println("error code", ex)
  }
}

我接下来要执行的是进一步导出 df_custom_schema 外部匹配以执行外部匹配循环操作。

下面是这样的

CustomSchemaDataFrame(filename) match {
  case Success(df_custom_schema) => {
      println("File Read Successfully")
      df_custom_schema.printSchema()
      df_custom_schema.show(true)
     val custom_schema = df_custom_schema
  }
  case Failure(ex) => {
      println("error code", ex)
  }
}

custom_schema.printSchema()

当我在里面使用上面的 custom_schema.printSchema() 时，它的工作正常，但是当我试图在它的抛出错误之外访问它时。有没有办法在匹配情况下访问值。因为我想对这个数据框执行其他几个操作。

问候鲯鳅

【问题讨论】：

抛出哪个错误？

标签： scala apache-spark apache-spark-sql

【解决方案1】：

没有。在模式匹配的情况下，您无法访问值。它是该函数范围的局部变量。

您需要做的是返回模式匹配的结果并继续处理该结果。

但这没有多大意义，因为您必须从 Success 和 Failure 块中返回有效的内容，然后您可能最好使用 recoverWith。

所以通常你会做类似的事情

Try {
  spark.read.format("csv").option("header", "true").load(filename)
} match {
  case Success(df) =>
    // ...
    // do all success related stuff to df here
    // ...
  case Failure(t) =>
    println(t)
}

或者，如果您希望在失败的情况下使用某种默认值

Try {
  spark.read.format("csv").option("header", "true").load(filename)
}.recoverWith {
  case t =>
    println(t)
    Success(DefaultDF())
}.map { df =>
  // do stuff here
}

【讨论】：

【解决方案2】：

你为什么不能写：

val custom_schema= CustomSchemaDataFrame(filename)

custom_schema match {
  case Success(df_custom_schema) => {
      println("File Read Successfully")
      df_custom_schema.printSchema()
      df_custom_schema.show(true)
     val custom_schema = df_custom_schema
  }
  case Failure(ex) => {
      println("error code", ex)
  }
}

custom_schema.get.printSchema()

当然你可能需要检查 custom_schema 是否成功

if ( custom_schema.isSuccess )
  custom_schema.get.printSchema

【讨论】：

感谢 gtosto 的输入，我还需要获取模式的全部数据才能对它们执行一些操作。如何获得上述只是产生模式
用custom_schema.get你得到函数CustomSchemaDataFrame返回的数据帧