【问题标题】:Error to write dataframe in Cassandra table on aws在 aws 上的 Cassandra 表中写入数据帧时出错
【发布时间】:2021-10-20 16:29:22
【问题描述】:

我正在尝试在 aws(Keyspace) 上编写数据帧,但以下消息如下:

堆栈:

dfExploded.write.cassandraFormat(table = "table", keyspace = "hub").mode(SaveMode.Append).save()
21/08/18 21:45:18 WARN DefaultTokenFactoryRegistry: [s0] Unsupported partitioner 'com.amazonaws.cassandra.DefaultPartitioner', token map will be empty.
java.lang.AssertionError: assertion failed: There are no contact points in the given set of hosts
  at scala.Predef$.assert(Predef.scala:223)
  at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$.determineDataCenter(LocalNodeFirstLoadBalancingPolicy.scala:195)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$dataCenterNodes$1(CassandraConnector.scala:192)
  at scala.Option.getOrElse(Option.scala:189)
  at com.datastax.spark.connector.cql.CassandraConnector$.dataCenterNodes(CassandraConnector.scala:192)
  at com.datastax.spark.connector.cql.CassandraConnector$.alternativeConnectionConfigs(CassandraConnector.scala:207)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$3(CassandraConnector.scala:169)
  at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:34)
  at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
  at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
  at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:89)
  at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
  at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
  at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
  at org.apache.spark.sql.DataFrameWriter.getTable$1(DataFrameWriter.scala:339)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)

Spark提交:

spark-submit --deploy-mode cluster --master yarn  \
--conf=spark.cassandra.connection.port="9142" \
--conf=spark.cassandra.connection.host="cassandra.sa-east-1.amazonaws.com" \
--conf=spark.cassandra.auth.username="BUU" \
--conf=spark.cassandra.auth.password="123456789" \
--conf=spark.cassandra.connection.ssl.enabled="true" \
--conf=spark.cassandra.connection.ssl.trustStore.path="cassandra_truststore.jks"
--conf=spark.cassandra.connection.ssl.trustStore.password="123456"

通过 cqlsh 连接一切正常,但在 spark 中出现此错误

【问题讨论】:

标签: java scala apache-spark cassandra apache-spark-sql


【解决方案1】:

错误指出的问题是 AWS Keyspaces 使用了 Spark-Cassandra-connector 不支持的分区器 (com.amazonaws.cassandra.DefaultPartitioner)。

关于 AWS Keyspaces 的底层数据库的公开文档并不多,所以我一直怀疑 Keyspaces 前面有一个 CQL API 引擎,所以它“看起来”像 Cassandra,但它可能得到了支持像 Dynamo DB 这样的东西。我很高兴得到来自 AWS 的某人的纠正,这样我就可以把它放在床上了。 ?

默认的 Cassandra 分区器是 Murmur3Partitioner,并且是唯一推荐的分区器。支持较旧的分区器(例如 RandomPartitionerByteOrderedPartitioner)只是为了向后兼容,但绝不应将其用于新集群。

最后,我们不会针对 AWS Keyspaces 测试 Spark 连接器,因此请为那里的许多惊喜做好准备。干杯!

【讨论】:

  • 为什么java客户端访问没有问题?
猜你喜欢
  • 2017-11-08
  • 2018-03-06
  • 2020-07-17
  • 1970-01-01
  • 2018-12-10
  • 1970-01-01
  • 2020-05-02
  • 2019-08-14
  • 2018-01-24
相关资源
最近更新 更多