What is Mean Time to Recovery (MTTR)?
How long it takes to restore service after a production failure.
Definition
Mean time to recovery (MTTR) measures the average time to restore service after a production incident or failed change. It is one of the four DORA metrics and a core reliability indicator.
How it’s measured
Measure the time from when an incident is detected to when service is restored, then average across incidents. It is derived from incident-management data — PagerDuty, incident.io, or error-tracking tools like Sentry.
What good looks like
Elite teams recover in under an hour. High performers recover in under a day. Low MTTR depends on observability, on-call readiness, and the ability to roll back quickly.
Why it matters
Failures are inevitable; how fast you recover is what protects customers and trust. A low MTTR lets teams take more delivery risk safely, reinforcing the throughput side of DORA.
Related terms
Measure Mean Time to Recovery (MTTR) automatically
DXSignal computes this and every other delivery, quality, and experience metric from the tools you already use.
Get Started Free