Kubernetes CronJob 未退出答案

【问题标题】：Kubernetes CronJob Not exitedKubernetes CronJob 未退出
【发布时间】：2021-08-27 02:26:28
【问题描述】：

我在 kubernetes 中运行一个 cronjob。 Cronjob 已启动但未退出。 pod 的状态始终为 RUNNING。以下是日志

kubectl get pods
cronjob-1623253800-xnwwx   1/1     Running            0          13h

当我描述下面的工作时会注意到

kubectl describe job cronjob-1623300120

Name:           cronjob-1623300120
Namespace:      cronjob
Selector:      xxxxx 
Labels:         xxxxx
Annotations:    <none>
Controlled By:  CronJob/cronjob
Parallelism:    1
Completions:    1
Start Time:     Thu, 9 Jun 2021 10:12:03 +0530
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=cronjob
           controller-xxxx
           job-name=cronjob-1623300120
  Containers:
   plannercronjob:
    Image:      xxxxxxxxxxxxx
    Port:       <none>
    Host Port:  <none>
    Mounts:                             <none>
  Volumes:                              <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  13h  job-controller  Created pod: cronjob-1623300120

我注意到 Pod 状态：1 个正在运行/0 个成功/0 个失败。这意味着当代码返回零时，作业成功/失败。对吗？

当我使用执行命令进入 pod 时

kubectl exec --stdin --tty cronjob-1623253800-xnwwx -n cronjob -- /bin/bash

root@cronjob-1623253800-xnwwx:/# ps ax| grep python
    1 ?        Ssl    0:01 python -m sfit.src.app
   18 pts/0    S+     0:00 grep python

我发现python进程还在运行。这是代码问题死锁还是其他原因。

pod describe
Name:         cronjob-1623302220-xnwwx
Namespace:    default
Priority:     0
Node:         aks-agentpool-xxxxvmss000000/10.240.0.4
Start Time:   Thu, 9 Jun 2021 10:47:02 +0530
Labels:       app=cronjob
              controller-uid=xxxxxx
              job-name=cronjob-1623302220
Annotations:  <none>
Status:       Running
IP:           10.244.1.30
IPs:
  IP:           10.244.1.30
Controlled By:  Job/cronjob-1623302220
Containers:
  plannercronjob:
    Container ID:   docker://xxxxxxxxxxxxxxxx
    Image: xxxxxxxxxxx
    Image ID:       docker-xxxx
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 9 Jun 2021 10:47:06 +0530
    Ready:          True
    Restart Count:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-97xzv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-97xzv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-97xzv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From                                        Message
  ----    ------     ----  ----                                        -------
  Normal  Scheduled  13h   default-scheduler                           Successfully assigned cronjob/cronjob-1623302220-xnwwx to aks-agentpool-xxx-vmss000000
  Normal  Pulling    13h   kubelet, aks-agentpool-xxx-vmss000000  Pulling image "xxxx.azurecr.io/xxx:1.1.1"
  Normal  Pulled     13h   kubelet, aks-agentpool-xxx-vmss000000  Successfully pulled image "xxx.azurecr.io/xx:1.1.1"
  Normal  Created    13h   kubelet, aks-agentpool-xxx-vmss000000  Created container cronjob
  Normal  Started    13h   kubelet, aks-agentpool-xxx-vmss000000  Started container cronjob

@KrishnaChaurasia 。我在我的系统中运行 docker 映像。我的 python 代码中有一些错误。但它是退出错误。但是在kubernetes中它没有退出也没有停止

docker run xxxxx/cronjob:1    
 File "/usr/local/lib/python3.8/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 261, in send
        raise error
    azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f113f6480a0>: Failed to establish a new connection: [Errno -2] Name or service not known

回声$？ 1

【问题讨论】：

非常感谢@KrishnaChaurasia
每次创建新版本时尝试使用新的镜像名称；很多时候，我观察到如果我们使用相同的图像名称，由于 k8s 缓存问题，pod 不会使用更新的脚本。
如前所述，我已经看到缓存问题，在更新脚本后使用相同的标签不会更新集群中的图像，因此可能是 cronjob 正在使用相同的脚本。你可以通过执行到作业的 pod 和 cat 来验证相同的文件，前提是它安装了 bash 和 cat。如果存在图像缓存问题，删除 cron 作业将无济于事。
@KrishnaChaurasia，您能否发布您的解决方案作为答案？
嗨 @MikołajGłodziak，我认为 OP 没有承认实际问题是什么，以及我的任何 cmets 是否已经解决了这个问题，所以我暂时将它们保留为 cmets。

标签： kubernetes kubernetes-pod kubernetes-cronjob

【解决方案1】：

如果您看到您的 pod 一直在运行并且从未完成，请尝试添加 startatingDeadlineSeconds。

https://medium.com/@hengfeng/what-does-kubernetes-cronjobs-startingdeadlineseconds-exactly-mean-cc2117f9795f

【讨论】：