塔式服务器 机架服务器
— Is there an article how to pack servers into the racks properly?
—是否有文章介绍如何将服务器正确装入机架?
I realised that I'm unaware of it. So, I decided to write my text.
我意识到自己没有意识到。 因此,我决定写我的文字。
Firstly, this is an article about bare metal servers in the data centre (DC) facilities. Secondly, we estimate that there are a lot of servers (hundreds or thousands); the article doesn't make sense for fewer quantities. Thirdly, we consider that there are three constraints in the racks: physical space, electric power per each one, and cabinets stay in the rows adjacent to each other, so we can use a single ToR switch to connect servers in them.
首先,这是一篇有关数据中心(DC)设施中的裸机服务器的文章。 其次,我们估计有很多服务器(数百或数千个)。 减少数量对这篇文章没有意义。 第三,我们认为机架中存在三个限制:物理空间,每个机架的电功率和机柜留在彼此相邻的行中,因此我们可以使用单个ToR交换机连接其中的服务器。
The answer to the original question depends significantly on the parameter we are optimising and on what we can change to get a better result. For instance, we need to use less space to leave more for future growth. Or maybe we have freedom in the selection of cabinet height, power per rack, number of sockets per PDU, number of cabinets per switch group (a switch per 1, 2, or 3 racks), cable lengths and cabling works. The last component is critical for end of rack rows where we need to pull cables into the other row or leave under-utilised ports in the switch. Completely different stories are server selection and data centre selection. We should consider that we chose them already.
原始问题的答案在很大程度上取决于我们正在优化的参数以及我们可以更改以获得更好结果的内容。 例如,我们需要使用更少的空间为将来的增长留出更多空间。 或者,也许我们可以自由选择机柜的高度,每个机架的功率,每个PDU的插座数量,每个交换机组(每个1、2或3个机架的交换机)的机柜数量,电缆长度和布线工作。 最后一个组件对于机架排的末端至关重要,在机架排的末端,我们需要将电缆拉入另一排或在交换机中保留未充分利用的端口。 完全不同的故事是服务器选择和数据中心选择。 我们应该考虑已经选择了它们。
It's good to understand some nuances and details, in particular, average/maximum server power consumption and how our vendor provides electricity. So, if we have a power supply of 230V 1phase, then a 32Amps circuit breaker can hold up to ~7kW. Let's say that we pay formally for a 6kW per rack. If a vendor measures our power consumption per row of 10 cabinets, not per a single one, and if circuit breakers limit power at 7kW, then we can use 6.9kW in a rack and 5.1kW in another one. It will be ok and unpunishable.
最好了解一些细微差别和细节,尤其是平均/最大服务器功耗以及我们的供应商如何供电。 因此,如果我们有一个230V的1相电源,那么一个32Amps的断路器最多可容纳约7kW。 假设我们为每个机架6kW正式付款。 如果供应商以每10个机柜的行而不是单个机柜来衡量我们的功耗,并且如果断路器将功率限制为7kW,那么我们可以在一个机架中使用6.9kW,在另一个机架中使用5.1kW。 没关系,不会受到惩罚。
Usually, our primary goal is to minimise spending. The best measurement criterium is total cost of ownership (TCO) reduction. It consists of the following parts:
通常,我们的主要目标是减少支出。 最好的衡量标准是降低总拥有成本(TCO)。 它由以下部分组成:
- CAPEX: buying data centre infrastructure, servers, network devices, cabling 资本支出:购买数据中心基础设施,服务器,网络设备,电缆
- OPEX: DC rent, electricity consumption, maintenance. OPEX depends on lifetime. It's reasonable to assume a lifetime is equal to 3 years. 运营支出:直流租金,电费,维护费用。 运营支出取决于使用寿命。 假设寿命等于3年是合理的。
We should optimise the most expensive parts of the pie. Everything else should use the remaining resources as effectively as possible.
我们应该优化馅饼中最昂贵的部分。 其他所有内容都应尽可能有效地使用剩余资源。
Supposedly, we have an existing DC, rack height of H units (for example H=47), power per rack Prack (Prack=6kW), and we decided to use h=2U two-unit servers. Let's remove 2 to 4 units from the rack for switches, patch panels, cable managers. Then we can fit Sh=rounddown((H-2..4)/h) servers in a rack (i.e. Sh = rounddown((47-4)/2) = 21 servers per rack). Let's memorise Sh.
假设我们有一个现有的DC,H的机架高度为H(例如H = 47),每个机架的功率为P 机架 (P rack = 6kW),因此我们决定使用h = 2U的两台服务器。 让我们从机架中卸下2到4个设备,用于交换机,配线架和电缆管理器。 然后,我们可以将S h = rounddown((H-2..4)/ h)服务器安装在机架中(即S h = rounddown((47-4)/ 2)=每个机架21台服务器)。 让我们记住S h 。
In a simple case, all the servers are the same. So, if we fill a rack by servers we can spend per server an average power of Pserv = Prack/Sh (Pserv = 6000W/21 = 287W ). We ignore switch power consumption here.
在简单的情况下,所有服务器都是相同的。 因此,如果我们按服务器填充机架,则每台服务器的平均功率为P serv = P rack / S h (P serv = 6000W / 21 = 287W)。 在这里,我们忽略了开关功耗。
Let's step aside and define what maximum server power consumption Pmax is. The straightforward, completely safe and highly inefficient way is to read what a label on the server power supply unit says. Here is Pmax.
让我们暂且定义最大服务器功耗P max是什么。 一种简单,完全安全且效率低下的方法是读取服务器电源设备上的标签上写的内容。 这是P max 。
A more complicated and efficient approach is to take TDP of all the components and sum them up. It's not accurate, but we can do it this way.
一种更复杂和有效的方法是对所有组件进行TDP并将其汇总。 这不准确,但是我们可以这样做。
Usually, we don't know TDP of components apart from CPU. So, the most correct and the most complicated approach is to take an experimental adequately configured server, load it, for example, by /Linpack/ (CPU and memory) and /fio/ (disks), and measure power consumption. We need a laboratory in this case. If we take things seriously, we should create a warm environment in the cold aisle because higher temperature affects both fans and CPU power consumption. Thus, we get the maximum power consumption of the sample server with this particular configuration within the current environment under the specific load. Just keep in mind that a new firmware, software version and other conditions may affect the result.
通常,除了CPU之外,我们不了解组件的TDP。 因此,最正确,最复杂的方法是采用经过实验配置充分的服务器,例如通过/ Linpack /(CPU和内存)和/ fio /(磁盘)加载服务器,然后测量功耗。 在这种情况下,我们需要一个实验室。 如果我们认真对待,应该在冷通道中创建一个温暖的环境,因为较高的温度会影响风扇和CPU的功耗。 因此,在特定负载下,在当前环境中,使用此特定配置,我们可以获得示例服务器的最大功耗。 请记住,新固件,软件版本和其他条件可能会影响结果。
Now, let's return back to Pserv and how should we compare it with Pmax. It's a question of understanding how the services work and how strong are the nerves of our CTO.
现在,让我们回到P serv以及如何将其与P max进行比较。 这是一个了解服务如何工作以及我们的CTO的神经strong强的问题。
If we don't accept any risk, we should assume that all the servers might start consuming their potential maximum simultaneously. At the same time, one of the DC feed can fail as well. Infrastructure should still provide the service. So, Pserv ≡ Pmax. It's the approach when reliability is highly important.
如果我们不承担任何风险,则应假定所有服务器可能同时开始消耗其潜在的最大值。 同时,直流馈电之一也可能发生故障。 基础结构仍应提供服务。 因此,P serv≡Pmax 。 当可靠性非常重要时,这就是方法。
If CIO takes into account not only ideal safety but also company money, if he is brave enough, then he can decide that
如果CIO不仅考虑理想的安全性,而且考虑公司的资金,如果他足够勇敢,那么他可以决定
- we start to manage our vendors, in particular, we forbid any planned maintenance in the periods of our expected high load to minimise power failure 我们开始管理我们的供应商,尤其是我们在预期的高负载期间禁止进行任何计划的维护,以最大程度地减少电源故障
- and or our architecture allows us to lose a rack/row/DC while services continue operations 和/或我们的体系结构允许我们在服务继续运行时丢失机架/行/ DC
- and or we distribute the load across the racks horizontally so well that our servers in a single cabinet will never consume their theoretical maximum all together. 或者我们将负载水平地分布在机架上,这样一来,我们位于单个机柜中的服务器将永远不会消耗其理论最大值。
It's advantageous not just guess here but monitor power consumption and understand how servers consume power during usual and peak loads. Thus and so after some analysis, the CIO travails and says:
不仅可以在此处猜测,而且可以监视功耗并了解服务器在正常负载和峰值负载下的功耗,这是有利的。 经过一番分析后,CIO苦苦地说:
«I command that maximum achievable average of all the server maximum power consumption is by
«我命令所有服务器最大功耗的最大可达到平均值是
非常 (so much)
less than the single server maximum consumption». Let it be P
少于单个服务器的最大消耗»。 设为P
serv=0.8*P 伺服 = 0.8 * P maxmaxAnd then a rack of 6kW can accommodate not 16 servers of Pmax = 375W but 20 servers of Pserv = 375W * 0.8 = 300W. I.e. 25% more servers. It's a real economy because we need 25% fewer racks. And we can save on rack PDUs, switches and cabling. A serious disadvantage of the solution is the need to check continuously that our assumptions are still valid. We should ensure that a new firmware doesn't change fan operation and power consumption significantly, that development team didn't start to use the servers much more efficiently (it means they managed to increase utilisation and power consumption). Then both initial assumptions and conclusions become wrong. So, it is the risk to be accepted responsibly. Or the risk can be avoided and then the company pays for obviously underloaded racks.
然后,一个6kW的机架不能容纳16个P max = 375W的服务器,而是20个P serv = 375W * 0.8 = 300W的服务器。 即服务器增加25%。 这是真正的经济,因为我们需要的机架数量减少了25%。 而且我们可以节省机架PDU,交换机和电缆的费用。 该解决方案的一个严重缺点是需要不断检查我们的假设是否仍然有效。 我们应确保新固件不会显着改变风扇的运行和功耗,并且开发团队不会开始更有效地使用服务器(这意味着它们设法提高了利用率和功耗)。 然后,最初的假设和结论都变得错误。 因此,以负责任的态度承担风险。 否则可以避免风险,然后公司为明显欠载的机架付费。
An important note: it's worth to try to distribute different services servers across the racks horizontally if possible. It is required to avoid cases when a bunch of servers for service arrives and is installed into cabinets vertically to improve «density» (just because it's easier to do this way). Indeed, it leads to the situation when one rack is filled with the same low-load servers while all highly loaded reside in another one. When the load profile is the same, and all the servers start to consume equally much simultaneously due to high load, the probability of losing the second rack becomes much higher.
重要说明:尝试在机架上水平分布不同的服务服务器是值得的。 要求避免一堆服务器到达并垂直安装到机柜中以提高“密度”的情况(只是因为这样做更容易)。 确实,当一个机架上装有相同的低负载服务器而所有高负载服务器都位于另一机架时,就会导致这种情况。 当负载配置文件相同且所有服务器由于高负载而开始同时消耗相同的资源时,丢失第二个机架的可能性就会大大提高。
Let's come back to server distribution in the racks. We considered physical constraints in the cabinets and power limitations. Now let's consider the network. One can use N=24/32/48-port switches (assuming 48-port ToR switches). Fortunately, there are not so many options if we ignore break-out cables. We consider options of a switch in every single rack, a switch per two or per three cabinets per group (Rnet). I believe that the group shouldn't be three. Otherwise, it leads to cabling issues.
让我们回到机架中的服务器分发。 我们考虑了机柜中的物理限制和功率限制。 现在让我们考虑网络。 一个可以使用N = 24/32/48端口交换机(假设48端口ToR交换机)。 幸运的是,如果我们忽略分支电缆,则没有太多选择。 我们考虑在每个机架中选择一个交换机,在每个组中每两个或三个机柜中选择一个交换机(R net )。 我认为该小组不应该是三个。 否则,会导致布线问题。
So, we distribute servers across the racks for each network scenario (1, 2, or 3 racks per group):
因此,我们针对每种网络场景(每个组1个,2个或3个机架)在机架之间分布服务器:
Srack = min(Sh, rounddown(Prack / Pserv), rounddown(N / Rnet))
S rack = min(S h ,向下舍入(P rack / P serv ),向下舍入(N / R net ))
Thus, a group of two racks scenario is
因此,一组两个机架的方案是
Srack2 = min(21, rounddown(6000/300), rounddown(48/2)) = min(21, 20, 24) = 20 servers per rack
S 机架2 =最小(21,向下(6000/300),向下(48/2))=最小(21,20,24)=每个机架20个服务器
Similarly, we count the other scenarios:
同样,我们计算其他情况:
Srack1 = 20
机架1 = 20
Srack3 = 16
机架3 = 16
We are almost done. We should count the total amount of racks to distribute all the servers S (let there be 1000 servers):
我们快完成了。 我们应该计算分配所有服务器S的机架总数(假设有1000台服务器):
R = roundup(S / (Srack * Rnet)) * Rnet
R =汇总(S /(S 机架 * R net ))* R net
R1 = roundup(1000 / (20 * 1)) * 1 = 50 * 1 = 50 racks
R 1 =汇总(1000 /(20 * 1))* 1 = 50 * 1 = 50个机架
R2 = roundup(1000 / (20 * 2)) * 2 = 25 * 2 = 50 racks
R 2 =汇总(1000 /(20 * 2))* 2 = 25 * 2 = 50个机架
R2 = roundup(1000 / (16 * 3)) * 3 = 21 * 3 = 63 racks
R 2 =汇总(1000 /(16 * 3))* 3 = 21 * 3 = 63个机架
Then we should count TCO for each option based on the number of racks, required switches, cabling, etc. We choose the scenario with the lowest TCO. Profit!
然后,我们应该根据机架数量,所需的交换机,电缆等对每个选项的TCO进行计数。我们选择TCO最低的方案。 利润!
Please note although the number of racks for scenarios 1 and 2 is the same, TCO is different due to twice less amount of switches and longer cables for the 2nd scenario.
请注意,尽管方案1和2的机架数相同,但由于第二种方案的交换机数量减少了两倍,电缆更长,因此TCO有所不同。
PS If power per rack or rack height may vary, then variability increases. But the selection may be reduced to the above method by brute-force the options. There will be more scenarios, but their quantity will be limited. We can increase power per rack in steps of 1kW, and there are a limited number standard rack types: of 42U, 45U, 47U, 48U. It might be helpful to use Excel's What-If analysis in Data Table mode. We should look at the resulting table and select the best option.
PS如果每个机架的功率或机架高度可能有所不同,则可变性会增加。 但是通过蛮力选择可以将选择减少到上述方法。 会有更多方案,但数量有限。 我们可以以1kW的步长增加每个机架的功率,并且标准机架类型的数量有限:42U,45U,47U,48U。 在数据表模式下使用Excel的假设分析可能会有所帮助。 我们应该查看结果表并选择最佳选项。
塔式服务器 机架服务器