【发布时间】:2017-12-09 21:09:52
【问题描述】:
目前,我们使用的是 cassandra 版本 2.0.14。机器在集群中出现故障,我在日志中看到以下异常。
WARN [New I/O server boss #33] 2017-07-06 06:37:33,097 Slf4JLogger.java (line 76) Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,123 StorageService.java (line 377) Stopping RPC server
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,123 ThriftServer.java (line 141) Stop listening to thrift clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,132 StorageService.java (line 382) Stopping native transport
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:34,965 Server.java (line 182) Stop listening for CQL clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:34,969 CommitLog.java (line 390) Failed to allocate new commit log segments. Commit disk failure policy is stop; terminating thread
FSWriteError in /myntra/cassandra/commitlog/CommitLog-3-1499285518666.log
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:143)
at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:90)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:262)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:50)
at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:109)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /myntra/cassandra/commitlog/CommitLog-3-1499285518666.log (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:125)
... 6 more
我们根据 datastax 生产建议增加了资源限制。 Cassandra 由 root 用户运行,root 用户的文件描述符限制为
[root@lgp-feed-cassandra2 cassandra]# ulimit -n
120000
以及来自运行进程的限制
[root@lgp-feed-cassandra2 cassandra]# cat /proc/117845/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 32768 32768 processes
Max open files 120000 120000 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 255823 255823 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
无法找出此问题的确切原因。任何线索都会有所帮助。
【问题讨论】:
-
这个节点是只运行 Cassandra 还是其他东西也在运行?你能显示“iostat”和“top”命令的输出吗?
-
Cassandra 将在启动期间打开 sstables 和 commitlogs - 每个 sstable 有 6 个组件。如果磁盘上有 20k sstables,则可能会达到 120,000 的限制(如果压缩远远落后,磁盘上可能有 20k sstables)。您可能可以将该限制从 120000 提高到 1000000 并查看服务器是否会启动,但您需要弄清楚您是如何在磁盘上获得这么多 sstable 的。
-
@JeffJirsa 我们发现了问题。我们的 python cassandra 客户端打开了很多导致问题的套接字。我们仍在尝试找出在 python 客户端中使用连接池。
标签: cassandra datastax cassandra-2.0