【问题标题】:Kubernetes Cron Job Terminate Pod before creation of next scheduleKubernetes Cron Job 在创建下一个计划之前终止 Pod
【发布时间】:2019-12-06 21:09:15
【问题描述】:

我有一个 Kubernetes Cron 作业,用于每 5 分钟运行一次计划任务。我想确保在下一个计划时间创建新的 pod 时,应该已经终止了较早的 pod。较早的 pod 应该在创建新的之前终止。 Kubernetes 可以在创建新的 pod 之前终止较早的 pod 吗?

我的 yaml 是:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-scheduled
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cmm-callout
            env:
              - name: SCHEDULED
                value: "true"
            livenessProbe:
              httpGet:
                path: /myapp/status
                port: 7070
                scheme: HTTPS
              initialDelaySeconds: 120
              timeoutSeconds: 30
              periodSeconds: 120                
            image: gcr.io/projectid/folder/my-app:9.0.8000.34
          restartPolicy: Never

如何确保在创建新的 pod 之前终止较早的 pod?

【问题讨论】:

    标签: kubernetes google-cloud-platform yaml google-kubernetes-engine


    【解决方案1】:

    我将 Mark 的解决方案与 spec.jobTemplate.spec.activeDeadlineSeconds 一起使用。

    只是其中还有一件事。来自 K8S 文档:

    一旦 Job 达到 activeDeadlineSeconds,其所有正在运行的 Pod 都将终止,并且 Job 状态将变为 type: Failed with reason: DeadlineExceeded。

    Pod 终止时实际发生的情况是 K8S 针对 POD 的容器进程 pid 0 触发 SIGTERM。它不等待实际进程终止。如果您的容器没有正常终止,它将保持终止状态 30 秒,之后 K8S 会触发 SIGKILL。同时,K8S 可能会调度另一个 Pod,因此终止的 Pod 与新调度的 Pod 最多重叠 30 秒。

    这很容易通过这个 CronJob 定义重现:

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: cj-sleep
    spec:
      concurrencyPolicy: Forbid
      failedJobsHistoryLimit: 5
      jobTemplate:
        metadata:
          creationTimestamp: null
        spec:
          activeDeadlineSeconds: 50
          template:
            metadata:
              creationTimestamp: null
            spec:
              containers:
              - command:
                    - "/usr/local/bin/bash"
                    - "-c"
                    - "--"
                args:
                    - "tail -f /dev/null & wait $!"
                image: bash
                imagePullPolicy: IfNotPresent
                name: cj-sleep
              dnsPolicy: ClusterFirst
              restartPolicy: OnFailure
              schedulerName: default-scheduler
              securityContext: {}
              terminationGracePeriodSeconds: 30
      schedule: '* * * * *'
      startingDeadlineSeconds: 100
      successfulJobsHistoryLimit: 5
    

    这就是调度的发生方式:

    while true; do date; kubectl get pods -A | grep cj-sleep; sleep 1; done
        
    Thu Sep  3 09:50:51 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Running            0          49s
    Thu Sep  3 09:50:53 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          50s
    Thu Sep  3 09:50:54 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          51s
    Thu Sep  3 09:50:55 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          52s
    Thu Sep  3 09:50:56 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          54s
    Thu Sep  3 09:50:58 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          56s
    Thu Sep  3 09:51:00 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          57s
    Thu Sep  3 09:51:01 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          58s
    Thu Sep  3 09:51:02 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          59s
    Thu Sep  3 09:51:03 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating         0          60s
    default                                     cj-sleep-1599126660-l69gd                                         0/1     ContainerCreating   0          0s
    Thu Sep  3 09:51:04 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating         0          61s
    default                                     cj-sleep-1599126660-l69gd                                         0/1     ContainerCreating   0          1s
    Thu Sep  3 09:51:05 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          62s
    default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          2s
        
        ....
    Thu Sep  3 09:51:29 UTC 2020
    default                                     cj-sleep-1599126600-kzzxg                                         0/1     Terminating        0          86s
    default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          26s
    Thu Sep  3 09:51:30 UTC 2020
    default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          28s
    Thu Sep  3 09:51:32 UTC 2020
    default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          29s
    

    init 0 进程有一个细节,默认情况下它们不处理 SIGTERM,您必须提供自己的处理程序。在 bash 的情况下,它是通过添加一个陷阱:

    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: cj-sleep
    spec:
      concurrencyPolicy: Forbid
      failedJobsHistoryLimit: 5
      jobTemplate:
        metadata:
          creationTimestamp: null
        spec:
          activeDeadlineSeconds: 50
          template:
            metadata:
              creationTimestamp: null
            spec:
              containers:
              - command:
                    - "/usr/local/bin/bash"
                    - "-c"
                    - "--"
                args:
                    - "trap 'exit' SIGTERM; tail -f /dev/null & wait $!"
                image: bash
                imagePullPolicy: IfNotPresent
                name: cj-sleep
              dnsPolicy: ClusterFirst
              restartPolicy: OnFailure
              schedulerName: default-scheduler
              securityContext: {}
              terminationGracePeriodSeconds: 30
      schedule: '* * * * *'
      startingDeadlineSeconds: 100
      successfulJobsHistoryLimit: 5
    

    现在调度是这样发生的:

    Thu Sep  3 09:47:54 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         1/1     Terminating        0          52s
    Thu Sep  3 09:47:56 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          54s
    Thu Sep  3 09:47:57 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          55s
    Thu Sep  3 09:47:58 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          56s
    Thu Sep  3 09:47:59 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          57s
    Thu Sep  3 09:48:00 UTC 2020
    default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          58s
    Thu Sep  3 09:48:01 UTC 2020
    Thu Sep  3 09:48:02 UTC 2020
    default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          1s
    Thu Sep  3 09:48:04 UTC 2020
    default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          2s
    Thu Sep  3 09:48:05 UTC 2020
    default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          3s
    Thu Sep  3 09:48:06 UTC 2020
    default                                     cj-sleep-1599126480-rlhlw                                         1/1     Running            0          4s
    

    【讨论】:

      【解决方案2】:

      如果我正确理解了您的情况(较早的 pod 应该在创建新 pod 之前终止)。

      1。请改用 spec.jobTemplate.spec.activeDeadlineSeconds

      通过在 Job 达到 activeDeadlineSeconds 时设置此参数 - 所有正在运行的 Pod 将被终止,并且 Job 状态将变为 type:Failed with reason DeadlineExceeded。

      示例:

      apiVersion: batch/v1beta1
      kind: CronJob
      metadata:
        name: hello
      spec:
        schedule: "*/5 * * * *"
        jobTemplate:
          spec:
            activeDeadlineSeconds: 60
            template:
              spec:
                containers:
                - name: hello
                  image: busybox
                  args:
                  - /bin/sh
                  - -c
                  - date; echo Hello from the Kubernetes cluster && sleep 420
                restartPolicy: Never
      

      2。第二种解决方案是设置concurrencyPolicy。并将当前正在运行的作业替换为新作业。

      示例:

      apiVersion: batch/v1beta1
      kind: CronJob
      metadata:
        name: hello
      spec:
        schedule: "*/2 * * * *"
        concurrencyPolicy: Replace
        jobTemplate:
          spec:
            template:
              spec:
                containers:
                - name: hello
                  image: busybox
                  args:
                  - /bin/sh
                  - -c
                  - date; echo Hello from the Kubernetes cluster && sleep 420
                restartPolicy: Never
      

      资源:

      【讨论】:

        【解决方案3】:

        您是否尝试将 concurrencyPolicy 设置为 Replace? Forbid 表示如果前一个作业尚未完成,则跳过新作业运行。

        https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy

        允许(默认):cron 作业允许同时运行的作业

        禁止:cron 作业不允许并发运行;如果是运行新作业的时间,而之前的作业运行尚未完成,则 cron 作业会跳过新作业的运行

        替换:如果到了新作业运行的时间并且之前的作业运行尚未完成,则 cron 作业将当前正在运行的作业运行替换为新的作业运行

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2022-12-21
          • 1970-01-01
          • 2012-01-15
          • 2017-07-09
          • 2019-09-19
          • 2011-02-22
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多