Health Check Monitor

A health check monitor fetches a JSON endpoint that reports on multiple sub-checks within your application. This is ideal for monitoring internal services like database connections, cache availability, queue health, and disk space from a single endpoint.

How It Works

Beepr fetches your health check URL every minute and parses the JSON response. Based on the sub-check statuses, Beepr determines the overall monitor status:

Condition	Status	Creates Incident
All checks `ok`	Up (green)	No
Any check `warning`	Degraded (yellow)	No
Any check `failed` or `crashed`	Down (red)	Yes
`finishedAt` timestamp too old	Stale (yellow)	No

When an incident is created, the failing sub-checks are listed in the incident timeline with their notification messages.

Response Format

Your endpoint must return JSON with this structure:

{
  "finishedAt": "1638879833",
  "checkResults": [
    {
      "name": "Database",
      "label": "Database Connection",
      "status": "ok",
      "notificationMessage": "",
      "shortSummary": "connected"
    },
    {
      "name": "Redis",
      "label": "Redis Cache",
      "status": "ok",
      "notificationMessage": "",
      "shortSummary": "available"
    },
    {
      "name": "DiskSpace",
      "label": "Disk Space",
      "status": "warning",
      "notificationMessage": "Disk usage at 85%",
      "shortSummary": "85%"
    }
  ]
}

Top-Level Fields

Field	Type	Required	Description
`finishedAt`	string or integer	No	Unix timestamp (seconds) when the checks were last run. Used for staleness detection.
`checkResults`	array	Yes	Array of check result objects

Check Result Fields

Field	Type	Required	Description
`name`	string	Yes	Unique identifier for the check
`status`	string	Yes	One of: `ok`, `warning`, `failed`, `crashed`, `skipped`
`label`	string	No	Human-readable name displayed in the UI
`notificationMessage`	string	No	Detailed message shown in incidents when the check fails
`shortSummary`	string	No	Brief status text for the dashboard (e.g., "45%", "connected")
`meta`	object	No	Additional metadata (not displayed, but stored)

Status Values

Status	Meaning	Effect
`ok`	Check passed	Contributes to "up" status
`warning`	Check has a non-critical issue	Sets monitor to "degraded" (no incident)
`failed`	Check failed	Sets monitor to "down" and creates incident
`crashed`	Check crashed unexpectedly	Sets monitor to "down" and creates incident
`skipped`	Check was skipped	Ignored in status calculation

Staleness Detection

If your health check endpoint caches results or runs checks periodically (not on every request), the finishedAt timestamp helps detect stale data.

Configure the staleness threshold (1-60 minutes) when creating the monitor. If the finishedAt timestamp is older than the threshold, the monitor status changes to "stale".

This is useful for:

Endpoints that cache health check results
Background jobs that update health status periodically
Detecting when your health check system itself has stopped running

Example Implementations

Elixir/Phoenix

defmodule MyAppWeb.HealthController do
  use MyAppWeb, :controller

  def index(conn, _params) do
    checks = [
      check_database(),
      check_redis(),
      check_disk_space()
    ]

    json(conn, %{
      finishedAt: DateTime.utc_now() |> DateTime.to_unix() |> to_string(),
      checkResults: checks
    })
  end

  defp check_database do
    case MyApp.Repo.query("SELECT 1") do
      {:ok, _} ->
        %{name: "Database", label: "Database Connection", status: "ok", shortSummary: "connected"}
      {:error, reason} ->
        %{name: "Database", label: "Database Connection", status: "failed",
          notificationMessage: "Database connection failed: #{inspect(reason)}"}
    end
  end

  defp check_redis do
    case Redix.command(:redix, ["PING"]) do
      {:ok, "PONG"} ->
        %{name: "Redis", label: "Redis Cache", status: "ok", shortSummary: "available"}
      _ ->
        %{name: "Redis", label: "Redis Cache", status: "failed",
          notificationMessage: "Redis not responding"}
    end
  end

  defp check_disk_space do
    {output, 0} = System.cmd("df", ["-h", "/"])
    # Parse output and check percentage
    used_percent = parse_disk_usage(output)

    cond do
      used_percent >= 90 ->
        %{name: "DiskSpace", label: "Disk Space", status: "failed",
          notificationMessage: "Disk usage critical at #{used_percent}%",
          shortSummary: "#{used_percent}%"}
      used_percent >= 80 ->
        %{name: "DiskSpace", label: "Disk Space", status: "warning",
          notificationMessage: "Disk usage high at #{used_percent}%",
          shortSummary: "#{used_percent}%"}
      true ->
        %{name: "DiskSpace", label: "Disk Space", status: "ok",
          shortSummary: "#{used_percent}%"}
    end
  end
end

Node.js/Express

app.get('/health', async (req, res) => {
  const checks = await Promise.all([
    checkDatabase(),
    checkRedis(),
    checkDiskSpace()
  ]);

  res.json({
    finishedAt: Math.floor(Date.now() / 1000).toString(),
    checkResults: checks
  });
});

async function checkDatabase() {
  try {
    await db.query('SELECT 1');
    return { name: 'Database', label: 'Database Connection', status: 'ok', shortSummary: 'connected' };
  } catch (error) {
    return { name: 'Database', label: 'Database Connection', status: 'failed',
             notificationMessage: `Database error: ${error.message}` };
  }
}

PHP/Laravel

Route::get('/health', function () {
    $checks = [
        checkDatabase(),
        checkRedis(),
        checkDiskSpace(),
    ];

    return response()->json([
        'finishedAt' => (string) time(),
        'checkResults' => $checks,
    ]);
});

function checkDatabase(): array {
    try {
        DB::select('SELECT 1');
        return ['name' => 'Database', 'label' => 'Database Connection',
                'status' => 'ok', 'shortSummary' => 'connected'];
    } catch (Exception $e) {
        return ['name' => 'Database', 'label' => 'Database Connection',
                'status' => 'failed', 'notificationMessage' => $e->getMessage()];
    }
}

Libraries

Several open-source libraries can generate compatible health check responses:

PHP: spatie/laravel-health
Ruby: health_check
Python: py-healthcheck

For other languages, implement an endpoint that returns the JSON format described above.

Best Practices

Keep checks fast: Health check endpoints should respond quickly. Avoid expensive operations.
Use meaningful names: The name field should be unique and descriptive for easy identification.
Include actionable messages: When a check fails, the notificationMessage should help diagnose the issue.
Set appropriate thresholds: Use warning status for early warnings before things become critical.
Secure the endpoint: Consider adding authentication or IP restrictions if the health check reveals sensitive information.

How It Works​

Response Format​

Top-Level Fields​

Check Result Fields​

Status Values​

Staleness Detection​

Example Implementations​

Elixir/Phoenix​

Node.js/Express​

PHP/Laravel​

Libraries​

Best Practices​