Circuit Breaker
Stops hammering a failing downstream when a handler's failure rate crosses a threshold. Jobs that would have run through an open circuit are rescheduled past the recovery window instead of executing.
Setup
Circuit Breaker is an opt-in addon. Register it alongside AddJoblyWorker:
builder.Services.AddJoblyWorker<AppDbContext>();
builder.Services.AddJoblyCircuitBreaker<AppDbContext>(o =>
{
o.Threshold = 5; // open after 5 consecutive failures
o.Duration = TimeSpan.FromMinutes(1); // stay open for 1 minute
o.ResetJitter = TimeSpan.FromSeconds(10); // ±10s reschedule jitter
});
The addon adds a new CircuitBreakerState entity to your DbContext via JoblyConfiguration.EntityConfigurators, so an EF Core migration is required after enabling it.
Usage
By default, each request type gets its own circuit keyed on typeof(TRequest).Name. Override the key to share a circuit across multiple handlers:
[CircuitBreaker(Group = "payments-gateway")]
public class ChargeCard : IJob { }
[CircuitBreaker(Group = "payments-gateway")]
public class RefundCard : IJob { }
Per-job overrides on the attribute take precedence over the global options:
[CircuitBreaker(Group = "flaky-api", Threshold = 10, DurationSeconds = 300)]
public class CallFlakyApi : IJob { }
States
The circuit is a three-state machine:
- Closed — normal operation. Handler runs. On success,
FailureCountis reset to 0. On failure,FailureCountis incremented; when it reachesThreshold, the circuit transitions to Open. - Open —
OpenUntil > now. Jobs are rescheduled forOpenUntil + rand(ResetJitter)without executing the handler. A JobLog entry"Rescheduled due to circuit breaker '<key>' (open)"is written. - HalfOpen —
OpenUntilhas lapsed. Exactly one worker wins a probe slot via an atomic CAS and executes the handler. Other workers observeHalfOpenand reschedule ("... (probe-in-progress)"). If the probe succeeds, the circuit transitions back to Closed andFailureCountis reset. If the probe fails, the circuit transitions back to Open with a freshOpenUntil.
Without the HalfOpen gate, every worker polling when OpenUntil lapses would fire a concurrent probe — a thundering herd against the recovering downstream. The CAS guarantees exactly one probe fires per recovery window.
Behavior During Open Circuit
When the circuit is open, a job goes through the pipeline like normal but the handler never runs. The pipeline behavior sets IJobContext.Outcome = JobOutcome { State = Enqueued, ScheduleTime = OpenUntil + jitter } and the worker reschedules the job. FailureCount is not incremented (the job never tried to run).
Jitter is applied to ScheduleTime so rescheduled jobs don't all hit the downstream at the exact moment the circuit expires. Two jobs rescheduled at the same instant against the same circuit still coordinate: only one probe wins the HalfOpen CAS, the other reschedules again with fresh jitter.
Interaction with Retry
Circuit Breaker short-circuits before Retry. If the circuit is open when a job would have retried, the job is rescheduled — but Retry's RetriedTimes counter is NOT incremented (the handler didn't run, so there was nothing to retry). The retry budget is preserved for when the circuit closes and the downstream is reachable again.
Interaction with Mutex
Circuit Breaker runs inside the handler pipeline after Mutex. A mutex conflict short-circuits the job to Deleted before the circuit is consulted — mutex-rejected jobs don't count toward the failure threshold.
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
Threshold | int | 3 | Consecutive failures before the circuit opens |
Duration | TimeSpan | 1m | How long the circuit stays open before the probe window |
ResetJitter | TimeSpan | 10s | Jitter added to each rescheduled ScheduleTime |
Per-handler overrides on [CircuitBreaker] use Group, Threshold, DurationSeconds, and ResetJitterSeconds.
Dashboard
Rescheduled jobs appear in the Enqueued tab with future ScheduleTime. The job's log shows "Rescheduled due to circuit breaker '<key>' (open|probe-in-progress|probe-lost)" — the reason disambiguates why a specific job was rescheduled.
When To Use
- Calls to third-party APIs that can go down without warning (payment gateways, email providers, webhooks).
- Downstream microservices with a deploy window — circuit opens on failure, probes during deploy, closes when healthy.
- Database or cache backends that can be saturated — prevents a retry storm from piling on during recovery.
For idempotent work against reliable infrastructure, Retry alone is usually enough.