Graceful retries in Python with backoff

← Back to Blog

Use this helper when a dependency is mostly reliable but occasionally flaky:

  • HTTP APIs under moderate load
  • internal services during deploys
  • third‑party integrations with rate limits

Failed HTTP calls are normal; silent failures are not. This pattern adds retries with jitter, logs every attempt, and keeps the code compact.

Core helper

import random
import time
import logging
from typing import Callable, TypeVar, Iterable

import requests

T = TypeVar("T")
logger = logging.getLogger(__name__)


def with_backoff(
    fn: Callable[[], T],
    attempts: int = 4,
    base: float = 0.4,
    factor: float = 2.0,
    jitter: float = 0.25,
    retry_on: Iterable[int] = (500, 502, 503, 504),
) -> T:
    for i in range(1, attempts + 1):
        try:
            return fn()
        except requests.HTTPError as exc:
            status = exc.response.status_code
            if status not in retry_on or i == attempts:
                logger.error("giving up", extra={"status": status, "attempt": i})
                raise
            delay = base * (factor ** (i - 1))
            delay = delay * (1 + random.uniform(-jitter, jitter))
            logger.warning("retrying", extra={"status": status, "attempt": i, "sleep": round(delay, 3)})
            time.sleep(delay)
    raise RuntimeError("exhausted retries")

Using it

logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")

API = "https://api.example.com/health"

def fetch_health() -> dict:
    resp = requests.get(API, timeout=3)
    resp.raise_for_status()
    return resp.json()

result = with_backoff(fetch_health)
print("service status:", result["status"])

Why this shape works

  • Keep it small: pure function, no decorators or globals.
  • Control backoff: jitter reduces thundering herd; factor controla o crescimento entre tentativas.
  • Log with structure: logging é amigável a JSON via extra, pronto para pipelines de logs.
  • Client-agnostic: troque requests por qualquer cliente ajustando a lógica de retry_on.

Extension ideas

  • Add circuit-breaking after repeated failures.
  • Expose metrics for attempts and durations.
  • Move retry policy to config so CI can run with fewer retries.