How to Clean Up Old Kubernetes Jobs


Graphic with the Kubernetes logo

Kubernetes Jobs create Pods repeatedly until a specified number of containers terminate successfully. Jobs are often used with the higher-level CronJob mechanism that automatically starts new Jobs on a recurring schedule.

Regular use of Jobs and CronJobs usually leads to a large number of objects lingering around in your cluster. Jobs and their Pods are intentionally kept indefinitely after they complete. This is so you can inspect the Job’s status and retrieve its logs in the future. However too many completed Jobs pollutes Kubectl output when you run commands like kubectl get pods or kubectl get jobs. This can make it harder to focus on relevant activity.

In this article we’ll share some methods for cleaning up old Jobs. You’ll be able to remove redundant objects from your cluster, either automatically or on-demand.

CronJob History Retention Limits

Automatic clean-up has been supported for Jobs created by a CronJob since Kubernetes v1.6. This method lets you configure separate deletion thresholds for completed and failed Jobs.

Enable the clean-up strategy by setting the spec.successfulJobsHistoryLimit and spec.failedJobsHistoryLimit fields on your CronJob object:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: demo-cron
spec:
  schedule: "* * * * *"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 10
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: demo-cron
              image: busybox:latest
              command: ["/bin/sh", "-c", "Job complete!"]

The CronJob shown above will retain the Job objects from its 10 most recent failed runs, as well as the five most recent successful ones.

You’ll have a maximum of 15 old Jobs in your cluster at any given time. Those 15 will be retained indefinitely. They’ll only be deleted when they’re superseded by a newer Job that finishes with the same status.

Default CronJob history limits are applied when you omit custom values in your manifest. Kubernetes usually retains three successful jobs and one failed one. The value 0 is supported to delete Jobs immediately after they finish, without retaining any.

Finished Job TTLs

Job TTLs are a newer Kubernetes addition that became stable in v1.23. TTLs are set directly on your Job objects so you don’t need to be using CronJobs. The TTL directs Kubernetes to delete the Job and its Pods after a fixed time has elapsed, irrespective of the Job’s completion status.

You can enable this mechanism by setting the spec.ttlSecondsAfterFinished field on your Job objects:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  ttlSecondsAfterFinished: 300
  template:
    spec:
      containers:
        - name: demo-cron
          image: busybox:latest
          command: ["/bin/sh", "-c", "Job complete!"]

If your Job’s defined as part of a CronJob, make sure you nest the field inside the jobTemplate:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: demo-cron
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 300
      template:
        spec:
          containers:
            - name: demo-cron
              image: busybox:latest
              command: ["/bin/sh", "-c", "Job complete!"]

The examples above will mark Jobs as eligible for deletion five minutes (300 seconds) after they finish. This occurs regardless of whether the Job ends up in the Complete or Failed state.

Deletions based on this mechanism are managed by a dedicated controller inside your cluster. The controller monitors Job objects, detects when a TTL has expired, and then takes action to clean up the affected Job and its dependent resources. There could be a short delay between the TTL expiring and the controller stepping in to enact the deletion.

Setting a Job’s TTL to 0 will make it eligible for deletion as soon as it finishes. You should consider whether this is appropriate for each of your tasks. Not retaining any history can make it harder to debug problems as you won’t be able to retrieve container logs.

The ttlSecondsAfterFinished field is mutable. You can change it on existing Jobs at any time. Modifications aren’t guaranteed to affect executions that have already been created though. Extending a Job’s TTL could still delete runs that were scheduled while the previous value applied.

Manually Deleting Jobs

You can always manually delete a Job using Kubectl. First retrieve your list of Jobs:

$ kubectl get jobs
NAME                 COMPLETIONS   DURATION   AGE
demo-cron-27549499   1/1           2s         36s

Next issue the delete job command with the name of your selected job:

$ kubectl delete job demo-cron-27549499
job.batch "demo-cron-27549499" deleted

This will delete the Job and clean up the Pods and other objects associated with it.

On-Demand Batch Deletions

You can rapidly clean up multiple Jobs by filtering with field selectors. Here’s an example which will delete all successful jobs in your active namespace:

$ kubectl delete jobs --field-selector status.successful=1
job.batch "demo-cron-27549501" deleted
job.batch "demo-cron-27549502" deleted
job.batch "demo-cron-27549503" deleted

This is much less laborious than deleting individual Jobs in sequence.

To delete all the Jobs associated with a specific CronJob, it’s best to set a label on those Jobs that identifies the parent CronJob. Define the label using your CronJob’s spec.jobTemplate.metadata.labels field:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: demo-cron
spec:
  schedule: "* * * * *"
  jobTemplate:
    metadata:
      labels:
        cron-job: demo-cron
    spec:
      template:
        spec:
          containers:
            - name: demo-cron
              image: busybox:latest
              command: ["/bin/sh", "-c", "Job complete!"]

You can delete Jobs created by this version of the demo-cron CronJob using the following command:

$ kubectl delete jobs -l cron-job=demo-cron
job.batch "demo-cron-27549501" deleted
job.batch "demo-cron-27549502" deleted
job.batch "demo-cron-27549503" deleted

Combining this example with a field selector lets you delete demo-cron runs that are in a particular state:

$ kubectl delete jobs -l cron-job=demo-cron --field-selector status.successful=0
job.batch "demo-cron-27549501" deleted
job.batch "demo-cron-27549502" deleted
job.batch "demo-cron-27549503" deleted

This command deletes failed Jobs that were scheduled by the demo-cron CronJob.

Summary

Kubernetes Jobs usually stick around in your cluster after they finish. This behavior lets you retrieve their logs at arbitrary future times but quickly leads to excessive object accumulation.

CronJobs come with retention limits that are on by default and support separate configuration for successful and failed runs. This should be your preferred mechanism for managing scheduled Jobs. Other kinds of Job are best configured with a TTL that will automatically clean up resources after a fixed time period.

Although Kubernetes has historically kept old Jobs indefinitely, it’s now recommended that you use the TTL mechanism wherever possible. Having too many old Jobs in your cluster can cause performance degradation as the control plane has to monitor the objects and the Pods they’ve created.





Source link

Previous articleWhat is Jony Ive, former Apple chief designer, doing now?
Next articleTwo Point Campus Review – Making the Grade