Database Fault Tolerance
Keep your jobs running - even in a volatile infrastructure landscape.
JobRunr by defaults stops completely if my SQL / NoSQL database goes down and there is a reason for this - JobRunr namely uses your database for a lot of things:
- Master node election for the
BackgroundJobServer
- Fetching the details of a
Job
and update the state when it’s done - Monitoring whether there are no zombie jobs (jobs that were being processed on a
BackgroundJobServer
node that crashed) - Optimistic locking so that a job will be only executed once
- …
The moment JobRunr loses it’s connection to the database (or the database goes down), there will be a lot of threads that will try to read and write updates to the database but all of these will of course fail. This will result in a huge amount of logging and if JobRunr would try to continue job processing, it would flood the disks fast because of each attempt to process a job fails. I’ve seen it happen and it happens faster than you think.
That’s why I decided that if there are too many exceptions because of the StorageProvider, JobRunr stops all background job processing. This can of course be monitored via the dashboard and health endpoints.
JobRunr Pro has your back
JobRunr Pro increases resilience for this and pauses all job processing as soon as the StorageProvider goes down.
It continues to monitor the database whether it comes up again and if so, automatically restarts processing on all the different BackgroundJobServers
.