【发布时间】:2021-11-06 23:11:00
【问题描述】:
我和我的团队正在尝试在 GCP 无服务器基础架构上部署计算量非常大的工作负载。由于 Cloud Run 的资源限制非常窄(4 个 vCPU 和 8GB 内存),我们接下来使用 Autopilot 测试 GKE。
使用默认的 Autopilot 集群,我设法配置了具有多达 8 个 vCPU 的单个部署和容器,但仅此而已。
我现在的问题是,是否有办法使用 resources.request.requests.cpu > 8 部署部署和容器,如果有,如何部署。
到目前为止我已经尝试过:
- 设置资源请求 - 这工作正常,最多 8 个
- 水平、垂直和多维自动缩放——这个好像没有
- NodeSelector 以便将 pod 部署在更强大的节点上 - 这对于 Autopilot 是禁止的
这是我的 deployment.yaml:
---
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "backend-flask"
namespace: "default"
labels:
app: "backend-flask"
spec:
replicas: 1
selector:
matchLabels:
app: "backend-flask"
template:
metadata:
labels:
app: "backend-flask"
spec:
containers:
- name: "backend-flask1"
image: "{...}backend-flask:latest"
resources:
requests:
memory: "6Gi"
cpu: "8"
limits:
memory: "32Gi"
cpu: "32"
# nodeSelector:
# beta.kubernetes.io/instance-type: e2-highcpu-32
---
# apiVersion: autoscaling.gke.io/v1beta1
# kind: MultidimPodAutoscaler
# metadata:
# name: backend-flask-autoscaler
# spec:
# scaleTargetRef:
# apiVersion: apps/v1
# kind: Deployment
# name: backend-flask
# goals:
# metrics:
# - type: Resource
# resource:
# # Define the target CPU utilization request here
# name: cpu
# target:
# type: Utilization
# averageUtilization: 80
# constraints:
# global:
# minReplicas: 1
# maxReplicas: 2
# containerControlledResources: [ memory ]
# container:
# - name: '*'
# # Define boundaries for the memory request here
# requests:
# minAllowed:
# memory: 4Gi
# cpu: 4
# maxAllowed:
# memory: 32Gi
# cpu: 32
# policy:
# updateMode: Auto
# ---
apiVersion: "autoscaling/v2beta1"
kind: "HorizontalPodAutoscaler"
metadata:
name: "backend-flask-horizontal-autoscaler"
namespace: "default"
labels:
app: "backend-flask"
spec:
scaleTargetRef:
kind: "Deployment"
name: "backend-flask"
apiVersion: "apps/v1"
minReplicas: 1
maxReplicas: 1
metrics:
- type: "Resource"
resource:
name: "cpu"
targetAverageUtilization: 80
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: backend-flask-horizontal-autoscaler
namespace: "default"
labels:
app: "backend-flask"
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: backend-flask
updatePolicy:
updateMode: "Auto"
---
apiVersion: "v1"
kind: "Service"
metadata:
name: "backend-flask-service"
namespace: "default"
labels:
app: "backend-flask"
spec:
ports:
- protocol: "TCP"
port: 5000
targetPort: 5000
selector:
app: "backend-flask"
type: "LoadBalancer"
loadBalancerIP: ""
【问题讨论】:
-
我可以在部署请求 16 个 CPU 时添加一个额外的 e2-highcpu-16 节点,但它只是空闲并且无法调度 pod
-
几分钟前我能够在 Autopilot 上部署 16CPU / 16G 而没有问题
-
您的部署或失败的 pod 的日志中有任何内容吗?
-
可能是你的CPU配额不够?
-
我也能够部署 28vCPU/28G。 Autopilot 的限制是每个 pod 28vCPU。 gist.github.com/mastersingh24/dbdf181569522c23ad70a6a2881870ec
标签: google-cloud-platform google-kubernetes-engine autoscaling