This excellent guide on how to autoscale background job processing with K8s and JobRunr was contributed by Dariusz Dumnicki.
Autoscaling is a method used to dynamically adjust the resources allocated to a cloud application based on some metrics. Typically, scaling is done based on CPU or memory usage, but in the context of task scheduling and background job processing, we may be interested in scaling based on the number of jobs in the queue, the time they spend in the queue, etc. JobRunr Pro provides the advanced metrics to enable efficient autoscaling of your application.
In this guide, we’ll show you how to autoscale your Kubernetes-managed JobRunr Pro applications using KEDA, a K8s-based event driven autoscaler. You will learn
- how to set up KEDA to enable autoscaling of your JobRunr Pro applications on Kubernetes
- how to configure autoscaling triggers based on JobRunr Pro metrics such as workers’ usage, job queue latency, and scheduled jobs count.
Prerequisites
- JobRunr Pro 8.0.0-beta.0 or later
- You already know how to configure JobRunr
- Basic knowledge of Kubernetes
- A working Kubernetes cluster
Why you should use JobRunr metrics to autoscale
Using JobRunr metrics for autoscaling ensures that your application scales dynamically based on real-time job processing needs. Unlike traditional autoscaling methods that rely solely on CPU or memory usage, JobRunr metrics allow you to scale based on specific workload factors such as workers’ usage, job queue latency, and scheduled jobs count.
If configured properly, this approach helps ensure efficient resource utilization and faster job processing, leading to better application performance and potential cost savings.
Setup
You can follow this guide step-by-step to deploy and configure autoscaling for your existing JobRunr application on Kubernetes. If you’re new to JobRunr, you can begin by following this guide to learn how to include and use it in your project. We assume you already know how to deploy your application on Kubernetes (otherwise check out the Kubernetes basics).
In case you prefer to get straight to the solution, you can view the code from this guide here.
Kubernetes deployment
To run your application in the cluster with the autoscaling capability, we will need to deploy several services:
- KEDA for autoscaling functionality,
- any database supported by JobRunr,
- your application running JobRunr.
First, deploy your JobRunr application. You will need to enable the dashboard to use the JobRunr Pro metrics API which is going to be used to trigger autoscaling. You can deploy the background job server together with the dashboard or separately but make sure that the metrics API is always available for KEDA to fetch the metrics.
For the purposes of this guide, we’ve deployed a simple Spring Boot application that allows us to schedule mock jobs via the /schedule-example-jobs
endpoint.
Our setup consists of 2 deployments:
web
- which exposes the JobRunr dashboard on port8000
and an API to schedule jobs on port8080
workers
- which enables background job server for background task processing
The separation of the
web
and theworkers
is there to enable us to scale the number of replicas responsible for processing jobs, while maintaining only one instance of the dashboard. TheBackgroundJobServer
can scale based on job queue metrics, ensuring optimal job processing efficiency. This separation ensures that even if noBackgroundJobServers
are currently running, scheduled jobs will still be processed as the autoscaling configuration will start a newBackgroundJobServer
on demand.
We have also created a Service object called jobrunr-service
that exposes ports of the web
deployment, enabling access to the dashboard and API from outside of the cluster. Additionally, we deployed PostgreSQL as our database.
You can see all the deployment files we used here.
Next, you can deploy KEDA in your Kubernetes cluster by executing the following command:
kubectl apply --server-side -f https://github.com/kedacore/keda/releases/download/v2.14.0/keda-2.14.0-core.yaml
For further deployment options, you may refer to the official KEDA guide.
Now, we are ready to configure the autoscaling behavior.
Configuring autoscaling
To configure the scaling of our application, we define a ScaledObject. Create a YAML file for our scaling configuration, we’ll call it keda-scaling.yaml
:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: jobrunr-scaling
spec:
scaleTargetRef:
name: workers
pollingInterval: 30
cooldownPeriod: 300
minReplicaCount: 0
maxReplicaCount: 10
Let’s take a look at the specification inside the spec
section. The property scaleTargetRef
points to the name of the deployment we want to scale. We can specify how often KEDA should fetch metrics from JobRunr by changing the pollingInterval
. The cooldownPeriod
property defines, after how many seconds of all triggers being inactive (based on activationTargetValue
), scaling to 0 should happen (takes effect only when minReplicaCount
is set to 0). Finally, the minReplicaCount
and maxReplicaCount
properties specify the minimal and maximal number of replicas our app can scale to.
Warning! Be careful setting
minReplicaCount
to 0, as it might cause problems with executing recurring jobs. See the considerations section.
For more configuration options, you can check the KEDA documentation.
Next, we define triggers inside the spec
section of the KEDA autoscaling specification. Each trigger
specifies which metric is used to trigger scaling.
Workers’ usage
Imagine you are running an e-commerce platform that processes a large number of background tasks, such as order processing, email notifications, and data synchronization. During peak shopping hours, the number of tasks increases dramatically, leading to high demand on your job processing infrastructure. Conversely, during off-peak hours, the workload significantly decreases.
By configuring an autoscaling trigger based on workers’ usage, you can dynamically adjust the number of job processing instances in your Kubernetes cluster. When the workers’ usage exceeds a certain threshold (e.g., 80%), indicating that the system is under heavy load, KEDA will automatically scale up the number of instances to handle the increased workload. Conversely, when the workers’ usage drops below the threshold, indicating reduced demand, the system will scale down to conserve resources.
To handle such a case, we can configure KEDA to scale based on the workers’ usage metric. In this example, we set the target utilization of workers to be 80%. The workers’ usage is exposed by JobRunr Pro metrics API, we can use the provided endpoint in our configuration below.
# ... your other configs
spec:
# ... your other specs
triggers:
# ... your other triggers
- type: metrics-api
metricType: Value
metadata:
targetValue: "0.8"
activationTargetValue: "0.0"
format: "json"
url: "http://jobrunr-service.default:8000/api/metrics/workers"
valueLocation: "usage"
Let’s briefly explain what this config means.
In this setup, we are using REST endpoints to fetch our metrics, so we use the metrics-api
type of trigger. The property metricType: Value
specifies that the target value is independent of the number of pods. What is the most interesting to us is the metadata
section. There we specify:
targetValue
- the desired value of the metric,activationTargetValue
- the threshold on which to scale from and to 0 replicas,format
- the format of the response from the specified url,url
- the endpoint from which to fetch the metric,valueLocation
- the field name of the metric of interest in the retrieved JSON object.
Scheduled jobs metrics
Consider a data analytics platform that processes large batches of data at scheduled intervals. For instance, an organization might collect and process sales data from various sources every hour to generate real-time sales reports for management. By configuring an autoscaling trigger based on the number of scheduled jobs, you can ensure that your Kubernetes cluster scales up in anticipation of these data processing tasks. If a significant number of jobs are scheduled to run soon, the system can proactively allocate more resources to handle the increased workload efficiently, ensuring timely and accurate reporting.
Conversely, when there are fewer scheduled data processing jobs, the system can scale down to conserve resources and reduce operational costs. This dynamic scaling based on scheduled jobs metrics ensures that your data processing pipeline remains responsive and efficient, regardless of workload fluctuations. This approach helps maintain optimal performance and cost-efficiency, ensuring critical tasks are completed on time and improving overall system reliability.
For this example we configure the mentioned trigger to ensure that there are on average 8 jobs per pod that are scheduled in 2 minutes. This configuration is shown below.
# ... your other configs
spec:
# ... your other specs
triggers:
# ... your other triggers
- type: metrics-api
metricType: AverageValue
metadata:
targetValue: "8" # 8 scheduled jobs per pod
activationTargetValue: "0" # activate when scheduled jobs > 0
format: "json"
url: "http://jobrunr-service.default:8000/api/metrics/jobs/scheduled?scheduledIn=PT2M"
valueLocation: "count"
It specifies that we are targeting to have 8 scheduled jobs per pod. We are interested in jobs that will run in the next 2 minutes. Here we use metricType: AverageValue
to specify that the targetVaue
is per pod (e.g., having 2 pods and 10 scheduled jobs, we get value 5 per pod, so we are below the targetValue
of 8).
Using the scheduled jobs metrics can be useful in a situation where we want to quickly process a large batch of jobs that are scheduled for some time in the future. We look 2 minutes ahead, and we already start creating pods to be ready when those jobs execute. Afterward, we scale down to save resources.
Enqueued jobs metrics
Consider a financial services platform that processes real-time transactions and updates user accounts. Ensuring quick task completion is critical for accurate balances and timely confirmations. By configuring an autoscaling trigger based on enqueued jobs latency, you can dynamically scale your Kubernetes cluster to reduce processing delays. If job latency exceeds a threshold (e.g., 5 minutes), the system scales up to handle the backlog efficiently.
Conversely, when job latency is low, the system scales down to conserve resources and cut costs. This dynamic scaling ensures your platform remains responsive and efficient, even during peak periods, preventing bottlenecks and optimizing resource use for critical financial operations.
In this case, our example includes a trigger that monitors when the latency of enqueued jobs exceeds 5 minutes in order to scale up. Let’s add this trigger to the triggers
section. Its configuration is below.
# ... your other configs
spec:
# ... your other specs
triggers:
# ... your other triggers
- type: metrics-api
metricType: Value
metadata:
targetValue: "300" # max 5 min latency
activationTargetValue: "0" # activate when jobs have > 0 latency
format: "json"
url: "http://jobrunr-service.default:8000/api/metrics/jobs/enqueued"
valueLocation: "latency"
This trigger is defining that we want to scale up if there are jobs that are waiting in queue for more than 5 minutes.
Beware of the artificial latency that might be introduced by using mutexes or priority queues. You can always specify the priority of the queue you are interested in by adding the
priority
parameter in the enqueued jobs metrics url, see JobRunr Pro metrics API for more info.
Applying the autoscaling configuration
Now, all that is left to do is applying our autoscaling configuration to the Kubernetes cluster:
kubectl apply -f k8s/keda-scaling.yaml
Congratulations! Your app is now autoscaling based on the 3 metrics we specified! Let’s verify it in the next section.
To learn more about KEDA autoscaling and see more advanced options, visit the KEDA documentation.
Testing the autoscaling
It’s time to see the autoscaler in action! We have defined three triggers, and when any of these triggers meet the criteria, KEDA will start scaling. The Horizontal Pod Autoscaler (HPA), which drives the scaling, will process the proposed replica counts for each trigger and use the highest one to scale the workload to.
Before we start, let us forward our application ports to localhost by executing this command:
kubectl port-forward svc/jobrunr-service 8000 8080
Now the JobRunr dashboard and metrics API are available at http://localhost:8000/, and our API for scheduling sample jobs is located at http://localhost:8080/.
Let’s open the JobRunr dashboard in the web browser: http://localhost:8000/dashboard.
If it’s your first time running the application, you will be prompted to input your JobRunr Pro license key.
First, look at the servers list on the dashboard http://localhost:8000/dashboard/servers. You should see no servers found
. This is because we specified minReplicaCount
to be 0. There are no pods running BackgroundJobServers
.
Now let’s schedule 40 jobs to run in 2 minutes by calling http://localhost:8080/schedule-example-jobs?count=40&runIn=PT2M. With the configuration we specified, our target is 8 jobs per pod. KEDA will scale up our deployment to match it. We can see the current pods with the following command:
kubectl get pods
We will get an output similar to this:
NAME READY STATUS RESTARTS AGE
postgres-f879f4c78-6xhpg 1/1 Running 0 121m
web-74db64f488-spv5g 1/1 Running 0 121m
workers-854f975c57-2pmds 0/1 ContainerCreating 0 1s
workers-854f975c57-fgrlz 0/1 ContainerCreating 0 1s
workers-854f975c57-n49bx 0/1 ContainerCreating 0 2s
workers-854f975c57-t9wvw 0/1 ContainerCreating 0 1s
workers-854f975c57-vbdjr 0/1 ContainerCreating 0 1s
Great! We see there are 5 new pods being created. After around 1 minute, the servers should announce themselves and will be visible in the JobRunr dashboard.

Once it’s time for our scheduled jobs to execute, they’ll be queued and then processed. Now the workers’ usage metric kicks in! As we have 40 jobs and only 20 total workers, we will see 100% workers’ usage throughout the processing of all the jobs. After 30 seconds (poll interval), we can observe another pod being created to match our specified 80% workers’ usage target.
If we wait 5 minutes and don’t schedule any new jobs, our deployment will scale back to 0.
Autoscaling your secured JobRunr dashboard
Securing your JobRunr dashboard is explained in detail in the authentication guides. KEDA inside your Kubernetes cluster can still access the secured metrics API, but it requires some additional configuration. This is described in detail in the KEDA documentation.
Limitations and considerations
Deploying a distributed application with enabled autoscaling might be tricky. There are some corner cases that may not be obvious at first. Moreover, Kubernetes comes with some limitations that we should be aware of.
Minimum replica count
Pay an extra attention when setting minReplicaCount
property to 0. When no pods are running, workers’ usage will always be 0 so to scale from 0 this metric has to be used with at least one additional metric (e.g. latency of enqueued jobs + workers usage). Additionally, no recurring jobs can be executed if no BackgroundJobServers
are running. You should always have 1 BackgroundJobServer
running in your cluster, as it is responsible for checking if any jobs need to be processed.
Long-running jobs
If you are planning to run jobs with long execution times, you might want to prevent Kubernetes from terminating your replicas that are still processing a long-running job. To do that, you can modify the graceful termination period for your pods in Kubernetes. This can be done by setting the terminationGracePeriodSeconds
property in your Pod spec
. You’ll also have to configure the duration to wait before interrupting jobs in JobRunr.
The termination period of both Kubernetes and JobRunr can be determined by the maximum expected runtime of your jobs. Imagine your longest running job takes 1 hour to complete. Then your graceful period should be greater than the execution time of that job, for example an 1 hour and 30 minutes.
You can configure your deployment as follows:
apiVersion: apps/v1
kind: Deployment
# ... your other configs
spec:
# ... your other specs
template:
# ... your other templates configs
spec:
# ... your other template specs
terminationGracePeriodSeconds: 5400
Similarly, a termination period would also have to be configured for JobRunr. There are multiple different options to configure JobRunr. Since we are using the jobrunr-pro-spring-boot-3-starter
dependency, we can add the following property to the application.properties
:
org.jobrunr.background-job-server.interrupt-jobs-await-duration-on-stop=90m
Important thing to note here, is that the Kubernetes termination period should match or be larger than the JobRunr one. This will ensure a safe interruption of the jobs and graceful termination of the JobRunr. If you are curious about how Kubernetes terminates the Pods, you can follow this link.
Poll interval
The workers’ usage is updated at every background job server poll interval. Therefore, setting a shorter poll interval in HPA when using only workers’ usage metric is not going to increase the frequency of workers’ usage updates.
Autoscaling your secured JobRunr dashboard
Securing your JobRunr dashboard is explained in detail in the authentication guides. KEDA inside your Kubernetes cluster can still access the secured metrics API, but it requires some additional configuration. This is described in detail in the KEDA documentation.
Conclusion
In this guide, we’ve learned how to set up autoscaling in Kubernetes using KEDA and use the JobRunr Pro metrics API to create the scaling triggers.
Finished example
The example app and k8s deployment files that we created in this guide are available in our github repository.
JobRunr Pro metrics API
JobRunr Pro allows us to access metrics that can be used for autoscaling. This is an overview of them:
Get Enqueued Jobs Metrics
GET /api/metrics/jobs/enqueued?priority=[integer]
Query parameters:
priority
get metrics only for jobs with a specified priority.
Response content example
{
"latency":500,
"count":10
}
latency
- time the oldest Job has spent in the queue in seconds.count
- number of jobs in the queue.
Get Scheduled Jobs Metrics
GET /api/metrics/jobs/scheduled?scheduledIn=[string]&priority=[integer]
Query parameters:
scheduledIn
ISO-8601 duration format, get metrics for jobs scheduled to run in specified time at the latest. By default, we only count jobs that are scheduled to run no later than 1 minute from now.priority
get metrics only for jobs with a specified priority.
Response content example
{
"count":10
}
count
- number of jobs scheduled to run in maxscheduledIn
time.
Get Workers Metrics
GET /api/metrics/workers
Response content example
{
"usage":0.25,
"totalWorkers":8,
"occupiedWorkers":2
}
usage
- percentage value of total usage of workers.totalWorkers
- total number of workers across allBackgroundJobServers
.occupiedWorkers
- number of occupied workers.