【问题标题】:Cassandra(2.0.4) is going down because of Too many open filesCassandra(2.0.4) 因打开文件过多而关闭
【发布时间】:2017-12-09 21:09:52
【问题描述】:

目前,我们使用的是 cassandra 版本 2.0.14。机器在集群中出现故障,我在日志中看到以下异常。

WARN [New I/O server boss #33] 2017-07-06 06:37:33,097 Slf4JLogger.java (line 76) Failed to accept a connection.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,123 StorageService.java (line 377) Stopping RPC server
 INFO [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,123 ThriftServer.java (line 141) Stop listening to thrift clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:33,132 StorageService.java (line 382) Stopping native transport
 INFO [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:34,965 Server.java (line 182) Stop listening for CQL clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-06 06:37:34,969 CommitLog.java (line 390) Failed to allocate new commit log segments. Commit disk failure policy is stop; terminating thread
FSWriteError in /myntra/cassandra/commitlog/CommitLog-3-1499285518666.log
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:143)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:90)
        at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:262)
        at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:50)
        at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:109)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /myntra/cassandra/commitlog/CommitLog-3-1499285518666.log (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
        at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:125)
        ... 6 more 

我们根据 datastax 生产建议增加了资源限制。 Cassandra 由 root 用户运行,root 用户的文件描述符限制为

[root@lgp-feed-cassandra2 cassandra]# ulimit -n
120000

以及来自运行进程的限制

[root@lgp-feed-cassandra2 cassandra]# cat /proc/117845/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             32768                32768                processes
Max open files            120000               120000               files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       255823               255823               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

无法找出此问题的确切原因。任何线索都会有所帮助。

【问题讨论】:

  • 这个节点是只运行 Cassandra 还是其他东西也在运行?你能显示“iostat”和“top”命令的输出吗?
  • Cassandra 将在启动期间打开 sstables 和 commitlogs - 每个 sstable 有 6 个组件。如果磁盘上有 20k sstables,则可能会达到 120,000 的限制(如果压缩远远落后,磁盘上可能有 20k sstables)。您可能可以将该限制从 120000 提高到 1000000 并查看服务器是否会启动,但您需要弄清楚您是如何在磁盘上获得这么多 sstable 的。
  • @JeffJirsa 我们发现了问题。我们的 python cassandra 客户端打开了很多导致问题的套接字。我们仍在尝试找出在 python 客户端中使用连接池。

标签: cassandra datastax cassandra-2.0


【解决方案1】:

需要设置ulimit,首先通过命令“ulimit -n”检查ulimit。 我们通过以下更改实现了:

root hard nofile 65535

root 软文件 65535

硬文件 65535

软文件 65535

$ sudo cat /etc/security/limits.conf

Refer this link for more details

【讨论】:

    猜你喜欢
    • 2019-08-03
    • 2020-02-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-11-15
    • 2019-06-20
    相关资源
    最近更新 更多