【问题标题】:ClickHouse - Too many linksClickHouse - 链接太多
【发布时间】:2021-12-11 04:23:02
【问题描述】:

我正在测试具有大量插入的 ClickHouse 服务器,并遇到服务器处于停止处理插入的状态,并出现“链接过多”异常。根据观察,我认为即使我停止插入,它也无法从状态中恢复。我还注意到“链接过多”异常消息每毫秒出现一次,导致服务器日志文件很快被填满。

测试环境。以及如何复制:

  • 服务器:双 xxx 14 核 @ 2.4 GHz,56 个 vCPU 和 256GB 内存。 Centos 7,clickhouse-server:21.2.2 修订版 54447(也用 21.8 测试过)
  • 引擎:MergeTree PARTITION BY toYYYYMMDD(time_generated) 按 time_generated 排序
  • 15 个客户端(10 个 clickhouse-client,5 个 CPP 客户端)连续一天左右插入 tsv 格式(批量大小为 500K 行)的日志数据(约 150 个字段)

在这种状态下,clickhouse-server 使用 1.5 个内核并且没有明显的文件 I/O 活动。 其他查询有效。 为了从状态中恢复,我删除了临时目录。

我认为我们通常不会在实践中以这种方式插入(忽略“太多部分”),但想知道这(进入这种状态)是否会成为问题。而且,除了不异常插入数据,有没有什么办法可以防止呢?

提前致谢。

日志:

- client 
  Code: 252. DB::Exception: Received from xx:9000. DB::Exception: Too many parts (303). Merges are processing significantly slower than inserts..

- server: 
  2021.10.21 09:17:48.649609 [ 21223 ] {} <Error> auto DB::IBackgroundJobExecutor::jobExecutingTask()::(anonymous class)::operator()() const: Poco::Exception. Code: 1000, e.code() = 31, e.displayText() = File access error: Too many links: /var/lib/clickhouse/tmp/store/48c/48cab972-1221-4222-a5f4-ed3960a08f35/tmp_merge_20211021_452585_452597_1, Stack trace (when copying this message, always include the lines below):

0. Poco::FileImpl::handleLastErrorImpl(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x11c42124 in /usr/bin/clickhouse
1. Poco::FileImpl::createDirectoryImpl() @ 0x11c4372f in /usr/bin/clickhouse
2. Poco::File::createDirectories() @ 0x11c456b7 in /usr/bin/clickhouse
3. DB::DiskLocal::createDirectories(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xe79e358 in /usr/bin/clickhouse
4. DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&, long, DB::Context const&, std::__1::unique_ptr<DB::IReservation, std::__1::default_delete<DB::IReservation> > const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xf36ad8e in /usr/bin/clickhouse
5. DB::StorageMergeTree::mergeSelectedParts(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::StorageMergeTree::MergeMutateSelectedEntry&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&) @ 0xf10f108 in /usr/bin/clickhouse
6. ? @ 0xf12168c in /usr/bin/clickhouse
7. ? @ 0xf2cb076 in /usr/bin/clickhouse
8. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8513fb8 in /usr/bin/clickhouse
9. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() @ 0x8515f6f in /usr/bin/clickhouse
10. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x851158f in /usr/bin/clickhouse
11. ? @ 0x8515023 in /usr/bin/clickhouse
12. ? @ 0x7eb5 in /usr/lib64/libpthread-2.17.so
13. __clone @ 0xfe8fd in /usr/lib64/libc-2.17.so
(version 21.2.2.8 (official build))enter code here

--- with 21.8.
2021.10.25 08:29:18.354200 [ 55326 ] {} <Error> auto DB::IBackgroundJobExecutor::execute(DB::JobAndPool)::(anonymous class)::operator()() const: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in create_directory: Too many links [/var/lib/clickhouse/tmp/store/48c/48cab972-1221-4222-a5f4-ed3960a08f35/tmp_merge_20211024_906198_906236_1], Stack trace (when copying this message, always include the lines below):

0. std::__1::system_error::system_error(std::__1::error_code, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x1590de6f in ?
1. ? @ 0x158a171f in ?
2. ? @ 0x158a1136 in ?
3. ? @ 0x158a58f8 in ?
4. std::__1::__fs::filesystem::__create_directory(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a646b in ?
5. std::__1::__fs::filesystem::__create_directories(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a6125 in ?
6. std::__1::__fs::filesystem::__create_directories(std::__1::__fs::filesystem::path const&, std::__1::error_code*) @ 0x158a6189 in ?
7. DB::DiskLocal::createDirectories(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xff032ec in /usr/bin/clickhouse
8. DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::BackgroundProcessListEntry<DB::MergeListElement, DB::MergeInfo>&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&, long, std::__1::shared_ptr<DB::Context const>, std::__1::unique_ptr<DB::IReservation, std::__1::default_delete<DB::IReservation> > const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::MergeTreeData::MergingParams const&, DB::IMergeTreeDataPart const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x10d14ff8 in /usr/bin/clickhouse
 9. DB::StorageMergeTree::mergeSelectedParts(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, bool, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::StorageMergeTree::MergeMutateSelectedEntry&, std::__1::shared_ptr<DB::RWLockImpl::LockHolderImpl>&) @ 0x10eea024 in /usr/bin/clickhouse
10. ? @ 0x10ef9937 in /usr/bin/clickhouse
11. ? @ 0x10c40e77 in /usr/bin/clickhouse
12. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8ffab98 in /usr/bin/clickhouse
13. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda0'()&&...)::'lambda'()::operator()() @ 0x8ffc73f in /usr/bin/clickhouse
14. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x8ff84ff in /usr/bin/clickhouse
15. ? @ 0x8ffb763 in /usr/bin/clickhouse
16. ? @ 0x7eb5 in /usr/lib64/libpthread-2.17.so
17. __clone @ 0xfe8fd in /usr/lib64/libc-2.17.so

Cannot print extra info for Poco::Exception (version 21.8.5.1.altinity+prestable (altinity build))

【问题讨论】:

  • 您多久插入一次?看起来像标准问题:插入不应该经常发生,而不是这些数据应该累积到批次中并每秒插入约 1 个。如果在您的情况下这是不可能的,现在有几个解决方法:从 3rd 方工具到 Clickhouse 本身内部的缓冲区表和异步插入
  • 感谢安德烈斯提供的信息。我无法控制频率; 15 个客户端每批插入 500K 行。客户端从服务器收到响应并立即发送下一批需要几秒钟。如果它收到异常,它会休眠 10 秒。但是,我不是在寻找更高注射的解决方案。而是查看这是否是一个已知问题以及何时发生这种情况,除了删除临时文件外,我们如何恢复它。或任何其他防止这种情况的提示,除了发送良好(或使用缓冲区等)。

标签: clickhouse


【解决方案1】:

df -i /var/lib/clickhouse/

df -h /var/lib/clickhouse/

  1. 将 CH 升级到 21.8.10.19 https://github.com/ClickHouse/ClickHouse/issues/26471

  2. https://github.com/ClickHouse/ClickHouse/issues/3174#issuecomment-423435071

  3. https://clickhouse.com/docs/en/operations/settings/merge-tree-settings/#parts-to-throw-insert

# cat /etc/clickhouse-server/config.d/z_parts_to_throw.xml
<yandex>
    <merge_tree>
        <old_parts_lifetime>30</old_parts_lifetime>
        <parts_to_delay_insert>150</parts_to_delay_insert>
        <parts_to_throw_insert>900</parts_to_throw_insert>
        <max_delay_to_insert>5</max_delay_to_insert>
    </merge_tree>
</yandex>
  1. https://clickhouse.com/docs/en/operations/settings/settings/#background_pool_size
# cat /etc/clickhouse-server/users.d/user_substitutes.xml
<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
            <background_pool_size>32</background_pool_size>
        </default>
    </profiles>
</yandex>
  1. 重启频道

【讨论】:

  • 感谢丹尼提供的信息。很高兴知道也有一个修复程序。将研究建议的参数,升级并继续。
猜你喜欢
  • 2020-07-20
  • 2017-10-18
  • 2020-11-21
  • 1970-01-01
  • 1970-01-01
  • 2021-09-16
  • 2012-09-26
  • 2015-07-14
  • 1970-01-01
相关资源
最近更新 更多