在 Spark 上更新 Oracle 表时如何避免 ORA-00060（检测到死锁）错误答案

【问题标题】：How to avoid ORA-00060 (deadlock detected) error when updating Oracle table on Spark在 Spark 上更新 Oracle 表时如何避免 ORA-00060（检测到死锁）错误
【发布时间】：2023-03-26 13:10:01
【问题描述】：

我的 spark 作业中有一个奇怪的错误，如果可能的话，我会使用一些解释。

因此，我的 Spark 作业从 Hive 表加载数据，将其转换为 Dataframe，然后根据某些列更新现有的 Oracle 表。

当数据框不是很大时，作业运行没有问题。当数据帧非常大时，作业会运行几个小时，然后会因 Oracle 错误而停止：

exception caught: org.apache.spark.SparkException: Job aborted due to stage failure: Task 104 in stage 43.0 failed 4 times, most recent failure: Lost task 104.3 in stage 43.0 (TID 5937, lxpbda55.ra1.intra.groupama.fr, executor 227): java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource

这就是我的代码的工作方式：

//This is where the error appears
modification(df_Delta_Modif, champs, conditions, cstProp)

//This is its definition
def modification(df: DataFrame, champs: List[String], conditions: List[String], cstProp: java.util.Properties) {
    val url = Parametre_mod.oracleUrl
    val options: JDBCOptions = new JDBCOptions(Map("url" -> url, "dbtable" -> Parametre_mod.targetTableBase, "user" -> Parametre_mod.oracleUser,
      "password" -> Parametre_mod.oraclePassword, "driver" -> "oracle.jdbc.driver.OracleDriver", "batchSize" -> "30000"))
    Crud_mod.modifierbatch(df, options, champs, conditions)
  }

//This is the definition of modifierbatch. It starts with establishing a connection to Oracle.
//Which surely works because I use the same thing on other scripts and it works fine
def modifierbatch(df: DataFrame,
              options : JDBCOptions,
               champs: List[String],
               conditions: List[String]) {
    val url = options.url
    val tables = options.table
    val dialect = JdbcDialects_mod.get(url)
    val nullTypes: Array[Int] = df.schema.fields.map { field =>
      getJdbcType(field.dataType, dialect).jdbcNullType
    }
    val rddSchema = df.schema
    val getConnection: () => Connection = createConnectionFactory(options)
    val batchSize = options.batchSize
    val chainestmt = creerOdreSQLmodificationSimple(champs, conditions, tables) //definition below
    val listChamps: List[Int] = champs.map(rddSchema.fieldIndex):::conditions.map(rddSchema.fieldIndex)
    df.foreachPartition { iterator =>
      //savePartition(getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect)
      executePartition(getConnection, tables, iterator, rddSchema, nullTypes, batchSize, chainestmt, listChamps, dialect, 0, "")
    }
  }

//This is the definition of creerOdreSQLmodificationSimple
def creerOdreSQLmodificationSimple(listChamps: List[String], listCondition: List[String], tablecible: String): String = {
    val champs = listChamps.map(_.toUpperCase).mkString(" = ?, ")
    val condition = listCondition.map(_.toUpperCase).mkString(" = ? and ")

    s"""UPDATE ${tablecible} SET ${champs} = ? WHERE ${condition} = ?"""
  }

所以你可以看到主体不是很复杂。我只是使用批处理执行 Oracle 函数（更新）。我不知道是什么导致了死锁问题。我没有在 Spark 中使用任何重新分区。

如果您需要更多详细信息，请告诉我。谢谢

【问题讨论】：

每个 Oracle 死锁都会生成一个跟踪文件，该文件解释哪些语句和对象与死锁有关。相关的代码和对象并不总是显而易见的——问题可能是由一些奇怪的事情引起的，比如在事务表上使用位图索引，或者同时运行另一个意外语句。查看警报日志，它将指出生成跟踪文件的位置（如果您无权访问服务器，请询问您的 DBA）。

标签： sql oracle scala apache-spark deadlock

【解决方案1】：

通过使用df.foreachPartition，看起来数据库访问是在多个并行连接上完成的。

如果是这样，则每个分区中必须存在更新相同行的条件。

您的选择是：

消除重叠，以免两次更新更新同一行。
如果您不能这样做，请进行安排，以确保影响给定行的所有更新都在同一个“分区”中。
如果您不能这样做，请在处理条件值之前对其进行排序。例如，如果您的条件类似于 column1 = ? and column2 = ? 并且您的值集是 { (1, 'R'), (5, 'Q'), (1,'B'), (2, 'Z') }，那么对它们进行排序 (1,'B')->(1,'R')->(2,'Z')->(5,'Q')。实际上，您如何对它们进行排序并不重要，只要排序顺序是明确的（没有关系）并且所有分区都以相同的方式对其条件进行排序。
不要使用foreachPartition（即不要尝试并行运行）。实际上，这只是上面 #2 的变体。

按照选项 3 对工作进行排序将避免死锁，但您将失去并行运行的大部分好处（因为某些分区会阻塞其他分区）。

【讨论】：