Checkmate
Getting started

Architecture overview

How Checkmate's monitoring engine, services, and integrations work together.

System architecture

Checkmate consists of four main components that work together:

┌─────────────────────────────────────────────────────────────────┐
│                        React frontend                           │
│         (Vite, MUI, Redux Toolkit, React Router)                │
└──────────────────────────┬──────────────────────────────────────┘
                           │ REST API
┌──────────────────────────▼──────────────────────────────────────┐
│                      Express backend                            │
│                                                                 │
│  ┌──────────┐  ┌──────────────┐  ┌────────────────────────┐    │
│  │ Auth &   │  │  Monitoring  │  │    Notifications       │    │
│  │ Users    │  │  Engine      │  │    (8 channels)        │    │
│  └──────────┘  └──────┬───────┘  └────────────────────────┘    │
│                       │                                         │
│  ┌──────────┐  ┌──────▼───────┐  ┌────────────────────────┐    │
│  │ Status   │  │  Job Queue   │  │    Incident &          │    │
│  │ Pages    │  │  (Scheduler) │  │    Maintenance         │    │
│  └──────────┘  └──────────────┘  └────────────────────────┘    │
└──────────┬──────────────────────────────────────────────────────┘

    ┌──────▼──────┐  ┌───────────┐
    │  MongoDB    │  │   Redis   │
    │  (primary)  │  │  (queue)  │
    └─────────────┘  └───────────┘

External connections:

Monitored endpoints  ◄──── HTTP, Ping, Port, gRPC, WebSocket checks
Capture agents       ◄──── Hardware metrics (CPU, RAM, disk, network)
Docker daemon        ◄──── Container health via docker.sock
Google PageSpeed API ◄──── Performance scores
GameDig servers      ◄──── Game server status

Monitoring engine

The monitoring engine is the core of Checkmate. It schedules checks, executes them via specialized providers, and processes the results.

Check execution flow

┌──────────────┐     ┌────────────────┐     ┌──────────────────┐
│  Job Queue   │────▶│ NetworkService  │────▶│  Status Provider │
│  (scheduler) │     │  (router)       │     │  (Http, Ping...) │
└──────────────┘     └────────────────┘     └────────┬─────────┘


┌──────────────┐     ┌────────────────┐     ┌──────────────────┐
│ Notification │◄────│ StatusService   │◄────│  Check result    │
│   Service    │     │  (processor)    │     │  (up/down/error) │
└──────────────┘     └────────────────┘     └──────────────────┘
  1. SuperSimpleQueue triggers jobs based on each monitor's interval
  2. NetworkService routes the check to the correct provider based on monitor type
  3. The provider executes the actual check (HTTP request, ICMP ping, etc.) and returns a status response
  4. StatusService processes the result:
    • Stores the check in MongoDB
    • Updates the monitor's rolling status window
    • Calculates uptime percentage
    • Detects status changes (up → down or down → up)
  5. On status change, NotificationsService sends alerts through configured channels

Monitor types and providers

Monitor typeProviderWhat it checks
httpHttpProviderHTTP/HTTPS endpoints with response validation
pingPingProviderICMP ping for network reachability
portPortProviderTCP port availability
pagespeedPageSpeedProviderGoogle PageSpeed scores and Web Vitals
hardwareHardwareProviderCPU, RAM, disk via Capture agent
dockerDockerProviderContainer health via Docker API
gameGameProviderGame server status via GameDig
grpcGrpcProvidergRPC service health checks
websocketWebSocketProviderWebSocket connection testing

Status determination

Checkmate uses a sliding window approach to determine monitor status:

statusWindow: [true, true, false, true, true]  ← last 5 checks


                              one failure

uptime = 4/5 = 80%

If uptime < statusWindowThreshold (default 60%) → status = "down"
  • statusWindow — rolling array of boolean results (true = success)
  • statusWindowSize — how many recent checks to consider (default: 5)
  • statusWindowThreshold — percentage below which the monitor is "down" (default: 60%)

This prevents flapping from a single failed check.

Possible statuses:

  • up — checks passing above threshold
  • down — checks failing below threshold
  • paused — monitoring disabled by user
  • maintenance — during a scheduled maintenance window
  • exceeded — infrastructure metric above threshold (CPU, memory, etc.)
  • initializing — first check hasn't completed yet

Service architecture

The backend uses a three-tier architecture with dependency injection.

┌─────────────────────────────────────────────────┐
│                 Controllers                      │
│         (HTTP handling, input validation)         │
└────────────────────┬────────────────────────────┘

┌────────────────────▼────────────────────────────┐
│                  Services                        │
│                                                  │
│   ┌─────────────┐  ┌──────────────────────┐     │
│   │  Business    │  │  Infrastructure       │     │
│   │  services    │  │  services             │     │
│   │              │  │                        │     │
│   │  • Monitor   │  │  • Network (routing)   │     │
│   │  • Check     │  │  • Status (processing) │     │
│   │  • User      │  │  • Notification        │     │
│   │  • Incident  │  │  • Email               │     │
│   │  • StatusPage│  │  • Buffer              │     │
│   └─────────────┘  └──────────────────────┘     │
└────────────────────┬────────────────────────────┘

┌────────────────────▼────────────────────────────┐
│               Repositories                       │
│      (data access — interface + MongoDB impl)    │
└─────────────────────────────────────────────────┘

How dependency injection works:

  1. config/services.ts — instantiates all repositories, providers, and services
  2. config/controllers.ts — creates controllers with injected services
  3. config/routes.ts — registers routes with controllers

This makes services testable and allows swapping implementations (e.g., replacing MongoDB with another database).


Notification system

When a monitor's status changes, the notification pipeline activates:

Status change detected


┌─────────────────────────┐
│ NotificationMessageBuilder │  ← Builds message with monitor details
└───────────┬─────────────┘


┌─────────────────────────┐
│  NotificationsService    │  ← Routes to enabled channels
└───────────┬─────────────┘

     ┌──────┼──────┬──────┬──────┬──────┬──────┬──────┐
     ▼      ▼      ▼      ▼      ▼      ▼      ▼      ▼
   Email  Slack  Discord Teams  PagerDuty Matrix Webhook

Each notification channel implements INotificationProvider with a standard sendNotification() method.

Email notifications use MJML templates compiled with Handlebars for dynamic content, supporting both SMTP (Nodemailer) and MailerSend as transport providers.


Infrastructure monitoring

Infrastructure monitoring requires the Capture agent running on monitored servers.

┌──────────────────────┐          ┌──────────────────────┐
│   Monitored server   │          │   Checkmate server   │
│                      │   HTTP   │                      │
│  ┌────────────────┐  │◄─────── │  ┌────────────────┐  │
│  │ Capture agent  │  │─────────▶│  │ HardwareProvider│  │
│  │ (Go binary)    │  │  JSON   │  └────────────────┘  │
│  └────────────────┘  │         │                      │
│                      │         │  Stores metrics in   │
│  Reads: CPU, RAM,    │         │  Check documents     │
│  disk, network,      │         │                      │
│  Docker, S.M.A.R.T.  │         │  Alerts if threshold │
│                      │         │  exceeded             │
└──────────────────────┘         └──────────────────────┘

Capture exposes a REST API on port 59232. Checkmate's HardwareProvider polls it at the configured interval, stores the metrics, and triggers alerts when thresholds are exceeded.


Data flow

Check data lifecycle

Check executed → Stored in MongoDB (time-series) → Aggregated in MonitorStats


                                                   Displayed in UI
                                                   (charts, tables)


                                                   TTL index cleanup
                                                   (configurable retention)
  • Checks are stored as time-series documents optimized for range queries
  • MonitorStats hold pre-aggregated data (uptime %, avg response time) to avoid expensive aggregations
  • BufferService batches writes for performance
  • TTL indexes automatically remove old check data based on the configured retention period

Authentication flow

Login request → bcrypt password verify → JWT token issued


                                        Token sent with
                                        each API request


                                        verifyJWT middleware
                                        → isAllowed (RBAC)
                                        → Controller

Three roles control access:

  • superadmin — full system access, user management
  • admin — monitor CRUD, notification config, team management
  • user — read-only access to monitors and dashboards

Technology stack summary

Backend

TechnologyPurpose
Node.js 20+Runtime
ExpressWeb framework
TypeScriptType safety
MongoDB (Mongoose)Primary database
Redis (ioredis)Job queue support
ZodInput validation
JWT (jsonwebtoken)Authentication
WinstonLogging
MJML + HandlebarsEmail templates

Frontend

TechnologyPurpose
React 18UI library
ViteBuild tool and dev server
TypeScriptType safety
Redux ToolkitState management
Material-UI 7Component library
SWRData fetching and caching
React Router v6Client-side routing
react-hook-formForm handling
i18nextInternationalization (18 languages)
RechartsCharts and visualizations
MapLibre GLGeographic map visualization

Infrastructure

TechnologyPurpose
DockerContainerization
HelmKubernetes deployment
NginxReverse proxy (production)
Capture (Go)Hardware monitoring agent

On this page