一旦工作开始，蝗虫工人就会“失踪”答案

【问题标题】：Locust workers set to "missing" as soon as job starts一旦工作开始，蝗虫工人就会“失踪”
【发布时间】：2022-06-30 23:42:36
【问题描述】：

我在 python 3.10 上运行 locust locust==2.8.6。我通过 AWS EKS 在 Kubernetes 上运行它。我分布式运行它，并尝试设置 1 个 master 和 5 个 worker。

主 pod 以命令开头：

command: ["locust"]
        args: ["-f","$filename","--headless","--users=$clients","--spawn-rate=$hatch-rate","--run-time=$run-time","--only-summary","--master","--expect-workers=$num_slaves"]

工人从命令开始：

command: ["locust"]
        args: ["-f","$filename","--worker","--master-host=locust-master$task_id"]

确实，在工作 pod 上，我可以运行 telnet locust-master1 5557 并确认通信。（在这种情况下，$task_id=1）

我在主 pod 中看到如下日志：

[2022-04-27 22:53:16,969] locust-master1--1-z2lr8/INFO/root: Waiting for workers to be ready, 0 of 5 connected
[2022-04-27 22:53:17,109] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c' reported as ready. Currently 1 clients ready to swarm.
[2022-04-27 22:53:17,147] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632' reported as ready. Currently 2 clients ready to swarm.
[2022-04-27 22:53:17,261] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c' reported as ready. Currently 3 clients ready to swarm.
[2022-04-27 22:53:17,354] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097' reported as ready. Currently 4 clients ready to swarm.
[2022-04-27 22:53:17,364] locust-master1--1-z2lr8/INFO/locust.runners: Client 'locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d' reported as ready. Currently 5 clients ready to swarm.
[2022-04-27 22:53:17,970] locust-master1--1-z2lr8/INFO/locust.main: Run time limit set to 5400 seconds
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.main: Starting Locust 2.8.6
[2022-04-27 22:53:17,971] locust-master1--1-z2lr8/INFO/locust.runners: Sending spawn jobs of 50 users at 0.50 spawn rate to 5 ready clients
[2022-04-27 22:53:17,977] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Startup: job_id: 1434194
[2022-04-27 22:53:18,376] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-cbkbd_2c90cedde5224e1e9cf47bbb543b9097 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,384] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-qv7kt_011dbeb9f15d452f935c5643fb463632 failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:20,385] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-ks5wb_356fcf54ac2644e4badc684e3846520c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-tt7n5_fec1320a406b42319f3088bd9a7c181c failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,391] locust-master1--1-z2lr8/INFO/locust.runners: Worker locust-slave1-xfvsz_196bba3928c5491e896acd411798d48d failed to send heartbeat, setting state to missing.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust.runners: The last worker went missing, stopping test.
[2022-04-27 22:53:22,392] locust-master1--1-z2lr8/INFO/locust_submit_judgments: Locust Teardown: sending query messages to Results DB

所以我确实看到工人注册了自己，但是一旦测试开始，主 pod 就会说工人无法发送心跳并将它们设置为丢失。如果我在没有--headless 的情况下运行主 pod，这意味着我可以打开 Web UI 并手动启动作业。我看到了同样的问题：当我手动启动作业时，会出现相同的心跳消息。

在工作 pod 上，我看到了我的调试启动日志，但没有任何迹象表明存在问题。

我在网上找不到关于如何设置分布式 locust 的指南（除了它被称为 locustio 和 0.x 版本时），从那时起情况发生了很大变化。

这里需要设置什么？如果不包含多行设置代码，我不确定要包含哪些代码。我正在尝试针对 postgres 进行测试，因此我正在考虑关注 https://docs.locust.io/en/stable/testing-other-systems.html，但在所有示例中，它们都包装了与我继承的代码不同的属性。

【问题讨论】：

标签： python kubernetes locust

【解决方案1】：

您检查过 CPU 利用率吗？当 VM 消耗 100 个 CPU 并且 worker 根本无法发送心跳时，我们也遇到过类似的情况。

【讨论】：