【发布时间】:2021-08-06 14:05:02
【问题描述】:
我有以下本地 2 节点 kubernetes 集群-
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srl1 Ready control-plane,master 2d18h v1.21.2 xxx.xxx.12.58 <none> Ubuntu 20.04.2 LTS 5.4.0-80-generic docker://20.10.7
srl2 Ready <none> 2d18h v1.21.3 xxx.xxx.80.72 <none> Ubuntu 18.04.2 LTS 5.4.0-80-generic docker://20.10.2
我正在尝试使用集群创建 python scirpt(https://github.com/hydro-project/cluster/blob/master/hydro/cluster/create_cluster.py) 部署应用程序
当它尝试使用apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE, body=yml) 创建路由节点时,预计它应该从 routing-ds.yaml(如下所示)文件创建单个 pod 并将其分配给路由守护程序集(kind)。然而,正如您所见,它在每个物理节点上创建了两个路由 pod 而不是一个。 (仅供参考-我的主人可以安排 pod)
akazad@srl1:~/hydro-project/cluster$ kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/management-pod 1/1 Running 0 25m 192.168.190.77 srl2 <none> <none>
default pod/monitoring-pod 1/1 Running 0 25m 192.168.120.71 srl1 <none> <none>
default pod/routing-nodes-9q7dr 1/1 Running 0 24m xxx.xxx.12.58 srl1 <none> <none>
default pod/routing-nodes-kfbnv 1/1 Running 0 24m xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/calico-kube-controllers-7676785684-tpz7q 1/1 Running 0 2d19h 192.168.120.65 srl1 <none> <none>
kube-system pod/calico-node-lnxtb 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/calico-node-mdvpd 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/coredns-558bd4d5db-vfghf 1/1 Running 0 2d19h 192.168.120.66 srl1 <none> <none>
kube-system pod/coredns-558bd4d5db-x7jhj 1/1 Running 0 2d19h xxx.xxx.120.67 srl1 <none> <none>
kube-system pod/etcd-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-apiserver-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-controller-manager-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-l8fds 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-szrng 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/kube-scheduler-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/controller-6b78bff7d9-t7gjr 1/1 Running 0 2d19h 192.168.190.65 srl2 <none> <none>
metallb-system pod/speaker-qsqnc 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/speaker-s4pp8 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d19h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 2d19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
default daemonset.apps/routing-nodes 2 2 2 2 2 <none> 24m routing-container akazad1/srlanna:v2 role=routing
kube-system daemonset.apps/calico-node 2 2 2 2 2 kubernetes.io/os=linux 2d19h calico-node calico/node:v3.14.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 2d19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.3 k8s-app=kube-proxy
metallb-system daemonset.apps/speaker 2 2 2 2 2 kubernetes.io/os=linux 2d19h speaker quay.io/metallb/speaker:v0.10.2 app=metallb,component=speaker
但是,当它直接从 management-pod.yaml(如下所示)创建 pod 时,它会按预期创建一个。
为什么 dasemonset 创建两个 pod 而不是一个?
应该在其中创建路由节点类型的守护程序集的代码段
for i in range(len(kinds)):
kind = kinds[i]
# Create should only be true when the DaemonSet is being created for the
# first time -- i.e., when this is called from create_cluster. After that,
# we can basically ignore this because the DaemonSet will take care of
# adding pods to created nodes.
if create:
fname = 'yaml/ds/%s-ds.yml' % kind
yml = util.load_yaml(fname, prefix)
for container in yml['spec']['template']['spec']['containers']:
env = container['env']
util.replace_yaml_val(env, 'ROUTING_IPS', route_str)
util.replace_yaml_val(env, 'ROUTE_ADDR', route_addr)
util.replace_yaml_val(env, 'SCHED_IPS', sched_str)
util.replace_yaml_val(env, 'FUNCTION_ADDR', function_addr)
util.replace_yaml_val(env, 'MON_IPS', mon_str)
util.replace_yaml_val(env, 'MGMT_IP', management_ip)
util.replace_yaml_val(env, 'SEED_IP', seed_ip)
apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE,
body=yml)
# Wait until all pods of this kind are running
res = []
while len(res) != expected_counts[i]:
res = util.get_pod_ips(client, 'role='+kind, is_running=True)
pods = client.list_namespaced_pod(namespace=util.NAMESPACE,
label_selector='role=' +
kind).items
created_pods = get_current_pod_container_pairs(pods)
当我在裸机集群上运行时,我已从所有 yaml 文件中删除了 nodeSelector。
1 路由-ds.yaml
14
15 apiVersion: apps/v1
16 kind: DaemonSet
17 metadata:
18 name: routing-nodes
19 labels:
20 role: routing
21 spec:
22 selector:
23 matchLabels:
24 role: routing
25 template:
26 metadata:
27 labels:
28 role: routing
29 spec:
30 #nodeSelector:
31 # role: routing
32
33 hostNetwork: true
34 containers:
35 - name: routing-container
36 image: akazad1/srlanna:v2
37 env:
38 - name: SERVER_TYPE
39 value: r
40 - name: MON_IPS
41 value: MON_IPS_DUMMY
42 - name: REPO_ORG
43 value: hydro-project
44 - name: REPO_BRANCH
45 value: master
2 管理-pod.yaml
15 apiVersion: v1
16 kind: Pod
17 metadata:
18 name: management-pod
19 labels:
20 role: management
21 spec:
22 restartPolicy: Never
23 containers:
24 - name: management-container
25 image: hydroproject/management
26 env:
27 #- name: AWS_ACCESS_KEY_ID
28 #value: ACCESS_KEY_ID_DUMMY
29 #- name: AWS_SECRET_ACCESS_KEY
30 #value: SECRET_KEY_DUMMY
31 #- name: KOPS_STATE_STORE
32 # value: KOPS_BUCKET_DUMMY
33 - name: HYDRO_CLUSTER_NAME
34 value: CLUSTER_NAME
35 - name: REPO_ORG
36 value: hydro-project
37 - name: REPO_BRANCH
38 value: master
39 - name: ANNA_REPO_ORG
40 value: hydro-project
41 - name: ANNA_REPO_BRANCH
42 value: master
43 # nodeSelector:
44 #role: general
【问题讨论】:
-
从该输出中,我看到
management-pod(这应该由 Deployment 控制吗?),大概是类似的monitoring-pod,加上来自routing-nodesDaemonSet 的 2 个 pod,每个都有一个节点。还有两个你没有展示的豆荚吗? -
我添加了更多的输出细节。我正在创建一次 dameonset,里面应该是一个 pod。守护程序集节点应将 pod 添加到其中。我不明白为什么它显示想要的两个。也许我从 yaml 中删除了 nodeSelector?
-
一个 DaemonSet在每个节点上创建一个 pod。你有一个控制平面节点和一个专用的工作节点,你说它们都是可调度的,所以这是 1 个 pod x 2 个节点 = 2 个。在
kubectl get pod -o wide输出中,您可以看到它们位于不同的节点上。