【发布时间】:2021-10-13 18:35:47
【问题描述】:
// Input Identifiers
val ids = List("4723847392423894", "4329479647236423", "42348726782684")
import spark.implicits._
val settings = Map("table" -> "table_name", "keyspace" -> "keyspace_name")
val tableDF = spark.read.format("org.apache.spark.sql.cassandra").options(settings).load()
val idsListDF = ids.asInstanceOf[List[String]].toDF("id").persist()
idsListDF.join(tableDF, tableDF.col("id") === idsListDF.col("id"), "inner").persist()
例外
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.cassandra.CassandraSourceRelation.directJoinSetting()Lorg/apache/spark/sql/cassandra/DirectJoinSetting;
at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy$.containsSafePlans(CassandraDirectJoinStrategy.scala:333)
at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy$.validJoinBranch(CassandraDirectJoinStrategy.scala:283)
at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.rightValid(CassandraDirectJoinStrategy.scala:139)
at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.hasValidDirectJoin(CassandraDirectJoinStrategy.scala:87)
at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.apply(CassandraDirectJoinStrategy.scala:30)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
你能帮我看看代码有什么问题吗?
我试过directJoin(Automatic)自动,总是,总是关闭,但仍然没有运气
idsListDF.join(tableDF.directJoin(Automatic), tableDF.col("batch_id") === idsListDF.col("id"), "inner").persist()
仅供参考 - 我正在使用 Spark Cassandra 连接器 jar - https://github.com/datastax/spark-cassandra-connector
【问题讨论】:
-
您使用的 Spark 版本 + 连接器版本是什么?
-
spark 版本是 - 2.4.4,spark Cassandra 连接器是 2.5.1
-
使用 spark/scala-cassandra 版本的正确组合。在此处检查版本兼容性 - github.com/datastax/…
-
是的,我只使用正确的版本
<dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2.11</artifactId> <version>2.5.1</version> </dependency> -
一些 spark Cassandra 连接器旧版本 jar 已添加到 spark jar 文件夹中,因此导致了该问题。不知道是谁把那个 jar 添加到 spark jar 文件夹中的。
标签: dataframe apache-spark cassandra apache-spark-sql spark-cassandra-connector