【问题标题】:Unexpected UnavailableException in a Cassandra clusterCassandra 集群中出现意外的 UnavailableException
【发布时间】:2013-05-16 14:48:20
【问题描述】:
  • 我有一个 3 节点 C* 集群。
  • C* 客户端已将读取一致性级别设置为 QUORUM。
  • 当集群中的一个节点关闭时,我收到一个 UnavailableException 以响应读取查询

为什么? 由 3 个节点组成的集群的法定人数为 2,因此它应该处理一个节点的中断。

更多细节:

Cassandra 版本:

ReleaseVersion: 1.1.6

键空间和列族的配置:

Keyspace: QuestionAnswerService:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:2]
  Column Families:
  //...
  ColumnFamily: answersByQuestion
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      DC Local Read repair chance: 0.0
      Populate IO Cache on flush: false
      Replicate on write: true
      Caching: KEYS_ONLY
      Bloom Filter FP chance: default
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
  //...

当一个节点关闭时,在读取查询期间抛出异常:

2013-05-21 17:43:37 ERROR CountingConnectionPoolMonitor:81 - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=cassandra.xxx.yyy(10.33.0.53):9160, latency=56(56), attempts=1]UnavailableException()
com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=cassandra.xxx.yyy(10.33.0.53):9160, latency=56(56), attempts=1]UnavailableException()
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1$2.execute(ThriftColumnFamilyQueryImpl.java:198)
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1$2.execute(ThriftColumnFamilyQueryImpl.java:190)
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:136)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:248)
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1.execute(ThriftColumnFamilyQueryImpl.java:188)
    at org.example.Casstest$delayedInit$body.apply(Casstest.scala:66)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:60)
    at scala.App$$anonfun$main$1.apply(App.scala:60)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
    at scala.collection.immutable.List.foreach(List.scala:45)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)
    at scala.App$class.main(App.scala:60)
    at org.example.Casstest$.main(Casstest.scala:14)
    at org.example.Casstest.main(Casstest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: UnavailableException()
    at org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:7288)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:552)
    at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:536)
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1$2.internalExecute(ThriftColumnFamilyQueryImpl.java:203)
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$1$2.internalExecute(ThriftColumnFamilyQueryImpl.java:190)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:55)
    ... 22 more

重现上述错误的 Scala 代码:

package org.example

import com.netflix.astyanax.connectionpool.impl.{CountingConnectionPoolMonitor, ConnectionPoolConfigurationImpl}
import com.netflix.astyanax.{Keyspace, AstyanaxContext}
import com.netflix.astyanax.impl.AstyanaxConfigurationImpl
import com.netflix.astyanax.connectionpool.NodeDiscoveryType
import com.netflix.astyanax.retry.ConstantBackoff
import com.netflix.astyanax.model.{ColumnFamily, ConsistencyLevel}
import com.netflix.astyanax.thrift.ThriftFamilyFactory
import com.netflix.astyanax.serializers.StringSerializer
import org.slf4j.LoggerFactory
import scala.collection.JavaConversions._

object Casstest extends App {

  println("Hello, cass-test")

  val logger = LoggerFactory.getLogger(Casstest.getClass)

  val clusterName = "Cassandra"
  val hostname = "cassandra.xxx.yyy"
  val port = 9160
  val thriftSocketTimeout = 4000
  val keyspaceName = "QuestionAnswerService"
  val timeout = 5000

  val connectionPool = new ConnectionPoolConfigurationImpl("ConnectionPool")
    .setPort(port)
    //    .setMaxConnsPerHost(1)
    .setSeeds(hostname + ":" + port)
    .setSocketTimeout(timeout)
    .setConnectTimeout(timeout)
    .setTimeoutWindow(timeout)

  val cassandraContext: AstyanaxContext[Keyspace] =
    new AstyanaxContext.Builder()
      .forCluster(clusterName)
      .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
      .setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE)
      .setRetryPolicy(new ConstantBackoff(timeout, 10000))
      .setDefaultReadConsistencyLevel(ConsistencyLevel.CL_QUORUM))
      .withConnectionPoolConfiguration(connectionPool)
      .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
      .forKeyspace(keyspaceName)
      .buildKeyspace(ThriftFamilyFactory.getInstance())

  cassandraContext.start()

  val keyspace: Keyspace = cassandraContext.getEntity()

  val answersByQuestionCf = new ColumnFamily[String, String](
    "answersByQuestion", // Column Family Name
    StringSerializer.get(), // Key Serializer
    StringSerializer.get(), // Column Serializer
    StringSerializer.get()) // Value Serializer

  while(true) {

    logger.info("query start")

    val result = keyspace
      .prepareQuery(answersByQuestionCf)
      .getKey("birthyear")
      .execute()

    logger.info("query finished: " + result.toString)

    result.getResult.getColumnNames.take(10) foreach {
      logger.info
    }

  }

}

【问题讨论】:

  • 你的所有节点都在datacenter1吗?可以通过cassandra-cli读写吗?
  • 是的。更重要的是,我刚刚发现将复制因子增加到 3 可以解决问题 - 我可以成功地在一个节点关闭的情况下进行查询。但是,根据文档,它不应该有所作为。对于 RF = 2 和 RF = 3 QUORUM = 2 链接:datastax.com/docs/1.1/dml/data_consistency [关于读取一致性]

标签: cassandra astyanax


【解决方案1】:

一致性级别所需的节点数是复制因子的函数,而不是集群中的节点数。因此,对于 RF=2,法定人数为 2,因此您的所有节点都必须能够读取您的所有数据。

您的集群中有 3 个节点,RF=2 并在 CL.QUORUM 读取,您只能在一个节点关闭的情况下访问 1/3 的数据。读取其他键将导致不可用异常。

在您的集群中有 3 个节点,RF=3 并且在 CL.QUORUM 上读取,您仍然可以在一个节点关闭的情况下访问您的所有数据。

【讨论】:

    猜你喜欢
    • 2014-05-05
    • 1970-01-01
    • 2011-10-24
    • 1970-01-01
    • 2015-10-27
    • 1970-01-01
    • 2011-07-17
    • 2023-03-12
    • 2013-09-09
    相关资源
    最近更新 更多