Skip to main content

Health Check Monitor

A health check monitor fetches a JSON endpoint that reports on multiple sub-checks within your application. This is ideal for monitoring internal services like database connections, cache availability, queue health, and disk space from a single endpoint.

How It Works

Beepr fetches your health check URL every minute and parses the JSON response. Based on the sub-check statuses, Beepr determines the overall monitor status:

ConditionStatusCreates Incident
All checks okUp (green)No
Any check warningDegraded (yellow)No
Any check failed or crashedDown (red)Yes
finishedAt timestamp too oldStale (yellow)No

When an incident is created, the failing sub-checks are listed in the incident timeline with their notification messages.

Response Format

Your endpoint must return JSON with this structure:

{
"finishedAt": "1638879833",
"checkResults": [
{
"name": "Database",
"label": "Database Connection",
"status": "ok",
"notificationMessage": "",
"shortSummary": "connected"
},
{
"name": "Redis",
"label": "Redis Cache",
"status": "ok",
"notificationMessage": "",
"shortSummary": "available"
},
{
"name": "DiskSpace",
"label": "Disk Space",
"status": "warning",
"notificationMessage": "Disk usage at 85%",
"shortSummary": "85%"
}
]
}

Top-Level Fields

FieldTypeRequiredDescription
finishedAtstring or integerNoUnix timestamp (seconds) when the checks were last run. Used for staleness detection.
checkResultsarrayYesArray of check result objects

Check Result Fields

FieldTypeRequiredDescription
namestringYesUnique identifier for the check
statusstringYesOne of: ok, warning, failed, crashed, skipped
labelstringNoHuman-readable name displayed in the UI
notificationMessagestringNoDetailed message shown in incidents when the check fails
shortSummarystringNoBrief status text for the dashboard (e.g., "45%", "connected")
metaobjectNoAdditional metadata (not displayed, but stored)

Status Values

StatusMeaningEffect
okCheck passedContributes to "up" status
warningCheck has a non-critical issueSets monitor to "degraded" (no incident)
failedCheck failedSets monitor to "down" and creates incident
crashedCheck crashed unexpectedlySets monitor to "down" and creates incident
skippedCheck was skippedIgnored in status calculation

Staleness Detection

If your health check endpoint caches results or runs checks periodically (not on every request), the finishedAt timestamp helps detect stale data.

Configure the staleness threshold (1-60 minutes) when creating the monitor. If the finishedAt timestamp is older than the threshold, the monitor status changes to "stale".

This is useful for:

  • Endpoints that cache health check results
  • Background jobs that update health status periodically
  • Detecting when your health check system itself has stopped running

Example Implementations

Elixir/Phoenix

defmodule MyAppWeb.HealthController do
use MyAppWeb, :controller

def index(conn, _params) do
checks = [
check_database(),
check_redis(),
check_disk_space()
]

json(conn, %{
finishedAt: DateTime.utc_now() |> DateTime.to_unix() |> to_string(),
checkResults: checks
})
end

defp check_database do
case MyApp.Repo.query("SELECT 1") do
{:ok, _} ->
%{name: "Database", label: "Database Connection", status: "ok", shortSummary: "connected"}
{:error, reason} ->
%{name: "Database", label: "Database Connection", status: "failed",
notificationMessage: "Database connection failed: #{inspect(reason)}"}
end
end

defp check_redis do
case Redix.command(:redix, ["PING"]) do
{:ok, "PONG"} ->
%{name: "Redis", label: "Redis Cache", status: "ok", shortSummary: "available"}
_ ->
%{name: "Redis", label: "Redis Cache", status: "failed",
notificationMessage: "Redis not responding"}
end
end

defp check_disk_space do
{output, 0} = System.cmd("df", ["-h", "/"])
# Parse output and check percentage
used_percent = parse_disk_usage(output)

cond do
used_percent >= 90 ->
%{name: "DiskSpace", label: "Disk Space", status: "failed",
notificationMessage: "Disk usage critical at #{used_percent}%",
shortSummary: "#{used_percent}%"}
used_percent >= 80 ->
%{name: "DiskSpace", label: "Disk Space", status: "warning",
notificationMessage: "Disk usage high at #{used_percent}%",
shortSummary: "#{used_percent}%"}
true ->
%{name: "DiskSpace", label: "Disk Space", status: "ok",
shortSummary: "#{used_percent}%"}
end
end
end

Node.js/Express

app.get('/health', async (req, res) => {
const checks = await Promise.all([
checkDatabase(),
checkRedis(),
checkDiskSpace()
]);

res.json({
finishedAt: Math.floor(Date.now() / 1000).toString(),
checkResults: checks
});
});

async function checkDatabase() {
try {
await db.query('SELECT 1');
return { name: 'Database', label: 'Database Connection', status: 'ok', shortSummary: 'connected' };
} catch (error) {
return { name: 'Database', label: 'Database Connection', status: 'failed',
notificationMessage: `Database error: ${error.message}` };
}
}

PHP/Laravel

Route::get('/health', function () {
$checks = [
checkDatabase(),
checkRedis(),
checkDiskSpace(),
];

return response()->json([
'finishedAt' => (string) time(),
'checkResults' => $checks,
]);
});

function checkDatabase(): array {
try {
DB::select('SELECT 1');
return ['name' => 'Database', 'label' => 'Database Connection',
'status' => 'ok', 'shortSummary' => 'connected'];
} catch (Exception $e) {
return ['name' => 'Database', 'label' => 'Database Connection',
'status' => 'failed', 'notificationMessage' => $e->getMessage()];
}
}

Libraries

Several open-source libraries can generate compatible health check responses:

For other languages, implement an endpoint that returns the JSON format described above.

Best Practices

  1. Keep checks fast: Health check endpoints should respond quickly. Avoid expensive operations.
  2. Use meaningful names: The name field should be unique and descriptive for easy identification.
  3. Include actionable messages: When a check fails, the notificationMessage should help diagnose the issue.
  4. Set appropriate thresholds: Use warning status for early warnings before things become critical.
  5. Secure the endpoint: Consider adding authentication or IP restrictions if the health check reveals sensitive information.