Close Menu
    Latest Post

    Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

    February 22, 2026

    Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

    February 22, 2026

    Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

    February 22, 2026
    Facebook X (Twitter) Instagram
    Trending
    • Anker’s X1 Pro shouldn’t exist, but I’m so glad it does
    • Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations
    • Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling
    • How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic
    • Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry
    • How to Cancel Your Google Pixel Watch Fitbit Premium Trial
    • GHD Speed Hair Dryer Review: Powerful Performance and User-Friendly Design
    • An FBI ‘Asset’ Helped Run a Dark Web Site That Sold Fentanyl-Laced Drugs for Years
    Facebook X (Twitter) Instagram Pinterest Vimeo
    NodeTodayNodeToday
    • Home
    • AI
    • Dev
    • Guides
    • Products
    • Security
    • Startups
    • Tech
    • Tools
    NodeTodayNodeToday
    Home»Dev»How to Build a Monitoring Application Using Golang
    Dev

    How to Build a Monitoring Application Using Golang

    Samuel AlejandroBy Samuel AlejandroJanuary 16, 2026No Comments15 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    src 1a306gr featured
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Cover image for How to Build a Monitoring Application Using Golang

    Monitoring is a critical part of running reliable software, yet many teams only discover outages after users complaints starts rolling in. Imagine getting a Slack message at 2 AM, indicating that APIs have been down for over an hour without anyone noticing until customers complained. A monitoring service addresses this by enabling proactive incident response, preventing problems from escalating.

    This tutorial details building a status monitoring application from scratch. Upon completion, the system will:

    1. Probe services on a schedule (HTTP, TCP, DNS, and more)
    2. Detect outages and send alerts to various communication channels (Teams, Slack, etc)
    3. Track incidents with automatic open/close functionality
    4. Expose metrics for Prometheus and Grafana dashboards
    5. Run within Docker containers

    Go is utilized for this application due to its speed, compilation into a single binary for cross-platform support, and robust concurrency handling, which is essential for simultaneously monitoring multiple endpoints.

    What We’re Building

    This article details building a Go application ‘StatusD’. It reads a configuration file containing a list of services to monitor, probes them, creates incidents, and dispatches notifications when issues arise.

    Tech Stack Used:

    • Golang
    • PostgreSQL
    • Grafana (Prometheus for metric)
    • Docker
    • Nginx

    The high-level architecture is shown below:

    ┌─────────────────────────────────────────────────────────────────┐
    │                        Docker Compose                           │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
    │  │ Postgres │  │Prometheus│  │  Grafana │  │      Nginx       │ │
    │  │    DB    │  │ (metrics)│  │(dashboard)│  │ (reverse proxy) │ │
    │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────────┬─────────┘ │
    │       │             │             │                  │          │
    │       └─────────────┴─────────────┴──────────────────┘          │
    │                              │                                  │
    │                    ┌─────────┴─────────┐                        │
    │                    │      StatusD      │                        │
    │                    │   (our Go app)    │                        │
    │                    └─────────┬─────────┘                        │
    │                              │                                  │
    └──────────────────────────────┼──────────────────────────────────┘
                                   │
                  ┌────────────────┼────────────────┐
                  ▼                ▼                ▼
             ┌────────┐       ┌────────┐       ┌────────┐
             │Service │       │Service │       │Service │
             │   A    │       │   B    │       │   C    │
             └────────┘       └────────┘       └────────┘
    

    Project Structure

    Understanding the project structure is essential before beginning to code. The structure is as follows:

    status-monitor/
    ├── cmd/statusd/
    │   └── main.go              # Application entry point
    ├── internal/
    │   ├── models/
    │   │   └── models.go        # Data structures (Asset, Incident, etc.)
    │   ├── probe/
    │   │   ├── probe.go         # Probe registry
    │   │   └── http.go          # HTTP probe implementation
    │   ├── scheduler/
    │   │   └── scheduler.go     # Worker pool and scheduling
    │   ├── alert/
    │   │   └── engine.go        # State machine and notifications
    │   ├── notifier/
    │   │   └── teams.go         # Teams/Slack integration
    │   ├── store/
    │   │   └── postgres.go      # Database layer
    │   ├── api/
    │   │   └── handlers.go      # REST API
    │   └── config/
    │       └── manifest.go      # Config loading
    ├── config/
    │   ├── manifest.json        # Services to monitor
    │   └── notifiers.json       # Notification channels
    ├── migrations/
    │   └── 001_init_schema.up.sql
    ├── docker-compose.yml
    ├── Dockerfile
    └── entrypoint.sh
    

    The Core Data Models

    This section defines the core data models, or ‘types,’ that represent a monitored service.

    Four primary ‘types’ are defined:

    1. Asset: This represents a service to be monitored.

    2. ProbeResult: This captures the outcome of an Asset check, including response, latency, etc.

    3. Incident: This tracks issues, specifically when a ProbeResult indicates an unexpected response, and when the service recovers.

    4. Notification: This refers to an alert or message sent to designated communication channels, such as Teams, Slack, or email.

    The types are defined in code as:

    // internal/models/models.go
    package models
    
    import "time"
    
    // Asset represents a monitored service
    type Asset struct {
        ID                  string            `json:"id"`
        AssetType           string            `json:"assetType"` // http, tcp, dns, etc.
        Name                string            `json:"name"`
        Address             string            `json:"address"`
        IntervalSeconds     int               `json:"intervalSeconds"`
        TimeoutSeconds      int               `json:"timeoutSeconds"`
        ExpectedStatusCodes []int             `json:"expectedStatusCodes,omitempty"`
        Metadata            map[string]string `json:"metadata,omitempty"`
    }
    
    // ProbeResult contains the outcome of a single health check
    type ProbeResult struct {
        AssetID   string
        Timestamp time.Time
        Success   bool
        LatencyMs int64
        Code      int    // HTTP status code
        Message   string // Error message if failed
    }
    
    // Incident tracks a service outage
    type Incident struct {
        ID        string
        AssetID   string
        StartedAt time.Time
        EndedAt   *time.Time // nil if still open
        Severity  string
        Summary   string
    }
    
    // Notification is what we send to Slack/Teams
    type Notification struct {
        AssetID   string
        AssetName string
        Event     string    // "DOWN", "RECOVERY", "UP"
        Timestamp time.Time
        Details   string
    }
    

    The ExpectedStatusCodes field in the Asset type is important; it allows defining what ‘healthy’ means for each service, as not all endpoints return a 200 status, with some returning 204 or redirects.

    Database Schema

    PostgreSQL is used to store probe results and incidents. The database schema is presented below:

    -- migrations/001_init_schema.up.sql
    
    CREATE TABLE IF NOT EXISTS assets (
        id TEXT PRIMARY KEY,
        name TEXT NOT NULL,
        address TEXT NOT NULL,
        asset_type TEXT NOT NULL DEFAULT 'http',
        interval_seconds INTEGER DEFAULT 300,
        timeout_seconds INTEGER DEFAULT 5,
        expected_status_codes TEXT,
        metadata JSONB,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE TABLE IF NOT EXISTS probe_events (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        asset_id TEXT NOT NULL REFERENCES assets(id),
        timestamp TIMESTAMP WITH TIME ZONE NOT NULL,
        success BOOLEAN NOT NULL,
        latency_ms BIGINT NOT NULL,
        code INTEGER,
        message TEXT
    );
    
    CREATE TABLE IF NOT EXISTS incidents (
        id SERIAL PRIMARY KEY,
        asset_id TEXT NOT NULL REFERENCES assets(id),
        severity TEXT DEFAULT 'INITIAL',
        summary TEXT,
        started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        ended_at TIMESTAMP
    );
    
    -- Indexes for common queries
    CREATE INDEX IF NOT EXISTS idx_probe_events_asset_id_timestamp
        ON probe_events(asset_id, timestamp DESC);
    CREATE INDEX IF NOT EXISTS idx_incidents_asset_id
        ON incidents(asset_id);
    CREATE INDEX IF NOT EXISTS idx_incidents_ended_at
        ON incidents(ended_at);
    

    A key aspect of the probe_events table is its indexing by asset_id and timestamp DESC. This indexing strategy enables efficient querying of a service’s probe results.

    Building the Probe System

    To support probing across various protocol types like HTTPS, TCP, DNS, and more without relying on complex switch statements, a registry pattern is employed.

    First, the structure of a probe is defined:

    // internal/probe/probe.go
    package probe
    
    import (
        "context"
        "fmt"
        "github.com/yourname/status/internal/models"
    )
    
    // Probe defines the interface for checking service health
    type Probe interface {
        Probe(ctx context.Context, asset models.Asset) (models.ProbeResult, error)
    }
    
    // registry holds all probe types
    var registry = make(map[string]func() Probe)
    
    // Register adds a probe type to the registry
    func Register(assetType string, factory func() Probe) {
        registry[assetType] = factory
    }
    
    // GetProbe returns a probe for the given asset type
    func GetProbe(assetType string) (Probe, error) {
        factory, ok := registry[assetType]
        if !ok {
            return nil, fmt.Errorf("unknown asset type: %s", assetType)
        }
        return factory(), nil
    }
    

    The HTTP probe implementation is as follows:

    // internal/probe/http.go
    package probe
    
    import (
        "context"
        "io"
        "net/http"
        "time"
        "github.com/yourname/status/internal/models"
    )
    
    func init() {
        Register("http", func() Probe { return &httpProbe{} })
    }
    
    type httpProbe struct{}
    
    func (p *httpProbe) Probe(ctx context.Context, asset models.Asset) (models.ProbeResult, error) {
        result := models.ProbeResult{
            AssetID:   asset.ID,
            Timestamp: time.Now(),
        }
    
        client := &http.Client{
            Timeout: time.Duration(asset.TimeoutSeconds) * time.Second,
        }
    
        req, err := http.NewRequestWithContext(ctx, http.MethodGet, asset.Address, nil)
        if err != nil {
            result.Success = false
            result.Message = err.Error()
            return result, err
        }
    
        start := time.Now()
        resp, err := client.Do(req)
        result.LatencyMs = time.Since(start).Milliseconds()
    
        if err != nil {
            result.Success = false
            result.Message = err.Error()
            return result, err
        }
        defer resp.Body.Close()
    
        // Read body (limit to 1MB)
        io.ReadAll(io.LimitReader(resp.Body, 1024*1024))
    
        result.Code = resp.StatusCode
    
        // Check if status code is expected
        if len(asset.ExpectedStatusCodes) > 0 {
            for _, code := range asset.ExpectedStatusCodes {
                if code == resp.StatusCode {
                    result.Success = true
                    return result, nil
                }
            }
            result.Success = false
            result.Message = "unexpected status code"
        } else {
            result.Success = resp.StatusCode < 400
        }
    
        return result, nil
    }
    

    The init() function automatically registers the HTTP probe when the Go application starts, eliminating the need for manual code changes. To add TCP probes, one would create a tcp.go file, implement the necessary interface, and register it within its init() function.

    Scheduling and Concurrency

    To probe all assets on a schedule, a worker pool is utilized. This approach enables concurrent execution of multiple probes without creating a separate goroutine for each service.

    // internal/scheduler/scheduler.go
    package scheduler
    
    import (
        "context"
        "sync"
        "time"
        "github.com/yourname/status/internal/models"
        "github.com/yourname/status/internal/probe"
    )
    
    type JobHandler func(result models.ProbeResult)
    
    type Scheduler struct {
        workers int
        jobs    chan models.Asset
        tickers map[string]*time.Ticker
        handler JobHandler
        mu      sync.Mutex
        done    chan struct{}
        wg      sync.WaitGroup
    }
    
    func NewScheduler(workerCount int, handler JobHandler) *Scheduler {
        return &Scheduler{
            workers: workerCount,
            jobs:    make(chan models.Asset, 100),
            tickers: make(map[string]*time.Ticker),
            handler: handler,
            done:    make(chan struct{}),
        }
    }
    
    func (s *Scheduler) Start(ctx context.Context) {
        for i := 0; i < s.workers; i++ {
            s.wg.Add(1)
            go s.worker(ctx)
        }
    }
    
    func (s *Scheduler) ScheduleAssets(assets []models.Asset) error {
        s.mu.Lock()
        defer s.mu.Unlock()
    
        for _, asset := range assets {
            interval := time.Duration(asset.IntervalSeconds) * time.Second
            ticker := time.NewTicker(interval)
            s.tickers[asset.ID] = ticker
    
            s.wg.Add(1)
            go s.scheduleAsset(asset, ticker)
        }
        return nil
    }
    
    func (s *Scheduler) scheduleAsset(asset models.Asset, ticker *time.Ticker) {
        defer s.wg.Done()
        for {
            select {
            case <-s.done:
                ticker.Stop()
                return
            case <-ticker.C:
                s.jobs <- asset
            }
        }
    }
    
    func (s *Scheduler) worker(ctx context.Context) {
        defer s.wg.Done()
        for {
            select {
            case <-s.done:
                return
            case asset := <-s.jobs:
                p, err := probe.GetProbe(asset.AssetType)
                if err != nil {
                    continue
                }
                result, _ := p.Probe(ctx, asset)
                s.handler(result)
            }
        }
    }
    
    func (s *Scheduler) Stop() {
        close(s.done)
        close(s.jobs)
        s.wg.Wait()
    }
    

    Each asset is assigned a dedicated ticker goroutine responsible solely for scheduling. When an asset needs checking, its ticker dispatches a probe job to a channel. A fixed number of worker goroutines monitor this channel and perform the actual probing tasks.

    Probes are not executed directly within ticker goroutines because they can block during network responses or timeouts. Workers are used to manage concurrency; for instance, with 4 workers and 100 assets, only 4 probes will run concurrently, even if multiple tickers activate simultaneously. The channel buffers pending jobs, and a sync.WaitGroup ensures a clean shutdown for all workers.

    Incident Detection: The State Machine

    A single probe failure does not automatically trigger an incident, as it could be a transient network issue. However, persistent failures lead to incident creation. Upon recovery, the incident is closed, and notifications are sent.

    This process functions as a state machine: UP → DOWN → UP.

    The engine is constructed as follows:

    // internal/alert/engine.go
    package alert
    
    import (
        "context"
        "fmt"
        "sync"
        "time"
        "github.com/yourname/status/internal/models"
        "github.com/yourname/status/internal/store"
    )
    
    type NotifierFunc func(ctx context.Context, notification models.Notification) error
    
    type AssetState struct {
        IsUp           bool
        LastProbeTime  time.Time
        OpenIncidentID string
    }
    
    type Engine struct {
        store      store.Store
        notifiers  map[string]NotifierFunc
        mu         sync.RWMutex
        assetState map[string]AssetState
    }
    
    func NewEngine(store store.Store) *Engine {
        return &Engine{
            store:      store,
            notifiers:  make(map[string]NotifierFunc),
            assetState: make(map[string]AssetState),
        }
    }
    
    func (e *Engine) RegisterNotifier(name string, fn NotifierFunc) {
        e.mu.Lock()
        defer e.mu.Unlock()
        e.notifiers[name] = fn
    }
    
    func (e *Engine) Process(ctx context.Context, result models.ProbeResult, asset models.Asset) error {
        e.mu.Lock()
        defer e.mu.Unlock()
    
        state := e.assetState[result.AssetID]
        state.LastProbeTime = result.Timestamp
    
        // State hasn't changed? Nothing to do.
        if state.IsUp == result.Success {
            e.assetState[result.AssetID] = state
            return nil
        }
    
        // Save probe event
        if err := e.store.SaveProbeEvent(ctx, result); err != nil {
            return err
        }
    
        if result.Success && !state.IsUp {
            // Recovery!
            return e.handleRecovery(ctx, asset, state)
        } else if !result.Success && state.IsUp {
            // Outage!
            return e.handleOutage(ctx, asset, state, result)
        }
    
        return nil
    }
    
    func (e *Engine) handleOutage(ctx context.Context, asset models.Asset, state AssetState, result models.ProbeResult) error {
        incidentID, err := e.store.CreateIncident(ctx, asset.ID, fmt.Sprintf("Service %s is down", asset.Name))
        if err != nil {
            return err
        }
    
        state.IsUp = false
        state.OpenIncidentID = incidentID
        e.assetState[asset.ID] = state
    
        notification := models.Notification{
            AssetID:   asset.ID,
            AssetName: asset.Name,
            Event:     "DOWN",
            Timestamp: result.Timestamp,
            Details:   result.Message,
        }
    
        return e.sendNotifications(ctx, notification)
    }
    
    func (e *Engine) handleRecovery(ctx context.Context, asset models.Asset, state AssetState) error {
        if state.OpenIncidentID != "" {
            e.store.CloseIncident(ctx, state.OpenIncidentID)
        }
    
        state.IsUp = true
        state.OpenIncidentID = ""
        e.assetState[asset.ID] = state
    
        notification := models.Notification{
            AssetID:   asset.ID,
            AssetName: asset.Name,
            Event:     "RECOVERY",
            Timestamp: time.Now(),
            Details:   "Service has recovered",
        }
    
        return e.sendNotifications(ctx, notification)
    }
    
    func (e *Engine) sendNotifications(ctx context.Context, notification models.Notification) error {
        for name, notifier := range e.notifiers {
            if err := notifier(ctx, notification); err != nil {
                fmt.Printf("notifier %s failed: %v\n", name, err)
            }
        }
        return nil
    }
    

    A key design choice involves tracking asset state in memory (assetState) for rapid lookups, while incidents are persisted to the database for durability. This allows state to be rebuilt from open incidents if the process restarts.

    Sending Notifications

    When an issue occurs, it is crucial to inform relevant parties. Notifications must be dispatched to various communication channels.

    The Teams notifier is defined as:

    // internal/notifier/teams.go
    package notifier
    
    import (
        "bytes"
        "context"
        "encoding/json"
        "fmt"
        "net/http"
        "time"
        "github.com/yourname/status/internal/models"
    )
    
    type TeamsNotifier struct {
        webhookURL string
        client     *http.Client
    }
    
    func NewTeamsNotifier(webhookURL string) *TeamsNotifier {
        return &TeamsNotifier{
            webhookURL: webhookURL,
            client:     &http.Client{Timeout: 10 * time.Second},
        }
    }
    
    func (t *TeamsNotifier) Notify(ctx context.Context, n models.Notification) error {
        emoji := "🟢"
        if n.Event == "DOWN" {
            emoji = "🔴"
        }
    
        card := map[string]interface{}{
            "type": "message",
            "attachments": []map[string]interface{}{
                {
                    "contentType": "application/vnd.microsoft.card.adaptive",
                    "content": map[string]interface{}{
                        "$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
                        "type":    "AdaptiveCard",
                        "version": "1.4",
                        "body": []map[string]interface{}{
                            {
                                "type":   "TextBlock",
                                "text":   fmt.Sprintf("%s %s - %s", emoji, n.AssetName, n.Event),
                                "weight": "Bolder",
                                "size":   "Large",
                            },
                            {
                                "type": "FactSet",
                                "facts": []map[string]interface{}{
                                    {"title": "Service", "value": n.AssetName},
                                    {"title": "Status", "value": n.Event},
                                    {"title": "Time", "value": n.Timestamp.Format(time.RFC1123)},
                                    {"title": "Details", "value": n.Details},
                                },
                            },
                        },
                    },
                },
            },
        }
    
        body, _ := json.Marshal(card)
        req, _ := http.NewRequestWithContext(ctx, "POST", t.webhookURL, bytes.NewReader(body))
        req.Header.Set("Content-Type", "application/json")
    
        resp, err := t.client.Do(req)
        if err != nil {
            return err
        }
        defer resp.Body.Close()
    
        if resp.StatusCode >= 300 {
            return fmt.Errorf("Teams webhook returned %d", resp.StatusCode)
        }
        return nil
    }
    

    Teams utilizes Adaptive Cards for enhanced formatting. Similar notifiers can be defined for other communication platforms, such as Slack or Discord.

    The REST API

    Endpoints are required to query the status of monitored services. Chi, a lightweight router supporting route parameters like /assets/{id}, is employed for this purpose.

    The APIs are defined as:

    // internal/api/handlers.go
    package api
    
    import (
        "encoding/json"
        "net/http"
        "github.com/go-chi/chi/v5"
        "github.com/go-chi/chi/v5/middleware"
        "github.com/yourname/status/internal/store"
    )
    
    type Server struct {
        store store.Store
        mux   *chi.Mux
    }
    
    func NewServer(s store.Store) *Server {
        srv := &Server{store: s, mux: chi.NewRouter()}
    
        srv.mux.Use(middleware.Logger)
        srv.mux.Use(middleware.Recoverer)
    
        srv.mux.Route("/api", func(r chi.Router) {
            r.Get("/health", srv.health)
            r.Get("/assets", srv.listAssets)
            r.Get("/assets/{id}/events", srv.getAssetEvents)
            r.Get("/incidents", srv.listIncidents)
        })
    
        return srv
    }
    
    func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
        s.mux.ServeHTTP(w, r)
    }
    
    func (s *Server) health(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{"status": "healthy"})
    }
    
    func (s *Server) listAssets(w http.ResponseWriter, r *http.Request) {
        assets, err := s.store.GetAssets(r.Context())
        if err != nil {
            http.Error(w, err.Error(), 500)
            return
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(assets)
    }
    
    func (s *Server) getAssetEvents(w http.ResponseWriter, r *http.Request) {
        id := chi.URLParam(r, "id")
        events, _ := s.store.GetProbeEvents(r.Context(), id, 100)
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(events)
    }
    
    func (s *Server) listIncidents(w http.ResponseWriter, r *http.Request) {
        incidents, _ := s.store.GetOpenIncidents(r.Context())
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(incidents)
    }
    

    The provided code defines a small HTTP API server, exposing four read-only endpoints:

    GET /api/health – Health check (to confirm service operation)

    GET /api/assets – Lists all monitored services

    GET /api/assets/{id}/events – Retrieves probe history for a specific service

    GET /api/incidents – Lists open incidents

    Dockerizing the Application

    Dockerizing the application is straightforward due to Go’s compilation into a single binary. A multi-stage build is used to minimize the final image size:

    
    # Dockerfile
    FROM golang:1.24-alpine AS builder
    WORKDIR /app
    
    RUN apk add --no-cache git
    COPY go.mod go.sum ./
    RUN go mod download
    COPY . .
    RUN CGO_ENABLED=0 GOOS=linux go build -o statusd ./cmd/statusd/
    
    FROM alpine:latest
    WORKDIR /app
    RUN apk --no-cache add ca-certificates
    COPY --from=builder /app/statusd .
    COPY entrypoint.sh .
    RUN chmod +x /app/entrypoint.sh
    
    EXPOSE 8080
    ENTRYPOINT ["/app/entrypoint.sh"]
    

    The builder stage handles code compilation. The final stage consists of Alpine Linux combined with the compiled binary, typically resulting in an image under 20MB.

    The entrypoint script constructs the database connection string using environment variables:

    #!/bin/sh
    # entrypoint.sh
    
    DB_HOST=${DB_HOST:-localhost}
    DB_PORT=${DB_PORT:-5432}
    DB_USER=${DB_USER:-status}
    DB_PASSWORD=${DB_PASSWORD:-status}
    DB_NAME=${DB_NAME:-status_db}
    
    DB_CONN_STRING="postgres://${DB_USER}:${DB_PASSWORD}@${DB_HOST}:${DB_PORT}/${DB_NAME}"
    
    exec ./statusd \
      -manifest /app/config/manifest.json \
      -notifiers /app/config/notifiers.json \
      -db "$DB_CONN_STRING" \
      -workers 4 \
      -api-port 8080
    

    Docker Compose: Putting It All Together

    A single docker-compose.yml file orchestrates the entire setup:

    
    # docker-compose.yml
    version: "3.8"
    
    services:
      postgres:
        image: postgres:15-alpine
        container_name: status_postgres
        environment:
          POSTGRES_USER: status
          POSTGRES_PASSWORD: changeme
          POSTGRES_DB: status_db
        volumes:
          - postgres_data:/var/lib/postgresql/data
          - ./migrations:/docker-entrypoint-initdb.d
        healthcheck:
          test: ["CMD-SHELL", "pg_isready -U status"]
          interval: 10s
          timeout: 5s
          retries: 5
        networks:
          - status_network
    
      statusd:
        build: .
        container_name: status_app
        environment:
          - DB_HOST=postgres
          - DB_PORT=5432
          - DB_USER=status
          - DB_PASSWORD=changeme
          - DB_NAME=status_db
        volumes:
          - ./config:/app/config:ro
        depends_on:
          postgres:
            condition: service_healthy
        networks:
          - status_network
    
      prometheus:
        image: prom/prometheus:latest
        container_name: status_prometheus
        volumes:
          - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml
          - prometheus_data:/prometheus
        networks:
          - status_network
        depends_on:
          - statusd
    
      grafana:
        image: grafana/grafana:latest
        container_name: status_grafana
        environment:
          GF_SECURITY_ADMIN_USER: admin
          GF_SECURITY_ADMIN_PASSWORD: admin
        volumes:
          - grafana_data:/var/lib/grafana
        networks:
          - status_network
        depends_on:
          - prometheus
    
      nginx:
        image: nginx:alpine
        container_name: status_nginx
        volumes:
          - ./docker/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
          - ./docker/nginx/conf.d:/etc/nginx/conf.d:ro
        ports:
          - "80:80"
        depends_on:
          - statusd
          - grafana
          - prometheus
        networks:
          - status_network
    
    networks:
      status_network:
        driver: bridge
    
    volumes:
      postgres_data:
      prometheus_data:
      grafana_data:
    

    Key points to observe include:

    • PostgreSQL healthcheck: The statusd service waits for Postgres to be fully operational, not just started, preventing ‘connection refused’ errors during initial boot.
    • Config mount: The ./config directory is mounted as read-only. Local edits to the manifest file are reflected in the running container.
    • Nginx: Handles routing external traffic to the Grafana and Prometheus dashboards.

    Configuration Files

    The application utilizes two configuration files: manifest.json and notifiers.json.

    1. The manifest.json file enumerates the assets designated for monitoring. Each asset requires an ID, a probe type, and an address. intervalSeconds dictates the checking frequency (e.g., 60 for once per minute). expectedStatusCodes allows defining ‘healthy’ states, accommodating endpoints that might return 301 redirects or 204 No Content.
    // config/manifest.json
    {
      "assets": [
        {
          "id": "api-prod",
          "assetType": "http",
          "name": "Production API",
          "address": "https://api.example.com/health",
          "intervalSeconds": 60,
          "timeoutSeconds": 5,
          "expectedStatusCodes": [200],
          "metadata": {
            "env": "prod",
            "owner": "platform-team"
          }
        },
        {
          "id": "web-prod",
          "assetType": "http",
          "name": "Production Website",
          "address": "https://www.example.com",
          "intervalSeconds": 120,
          "timeoutSeconds": 10,
          "expectedStatusCodes": [200, 301]
        }
      ]
    }
    
    1. The notifiers.json file governs alert distribution. It defines notification channels (e.g., Teams, Slack) and establishes policies for which channels activate on specific events. A throttleSeconds value of 300, for example, prevents excessive notifications for the same issue, limiting them to once every 5 minutes.
    // config/notifiers.json
    {
      "notifiers": {
        "teams": {
          "type": "teams",
          "webhookUrl": "https://outlook.office.com/webhook/your-webhook-url"
        }
      },
      "notificationPolicy": {
        "onDown": ["teams"],
        "onRecovery": ["teams"],
        "throttleSeconds": 300,
        "repeatAlerts": false
      }
    }
    

    Running It

    docker-compose up -d
    

    With these configurations, five services are launched:

    1. PostgreSQL stores data
    2. StatusD probes services
    3. Prometheus collects metrics
    4. Grafana displays dashboards (http://localhost:80)
    5. Nginx routes all traffic

    To inspect the logs:

    docker logs -f status_app
    

    The expected output is:

    Loading assets manifest...
    Loaded 2 assets
    Loading notifiers config...
    Loaded 1 notifiers
    Connecting to database...
    Starting scheduler...
    [✓] Production API (api-prod): 45ms
    [✓] Production Website (web-prod): 120ms
    

    Summary

    This tutorial guides the creation of a monitoring system capable of:

    1. Reading services from a JSON config
    2. Probing them on a schedule using a worker pool
    3. Detecting outages and creating incidents
    4. Sending notifications to Teams/Slack
    5. Exposing metrics for Prometheus
    6. Running in Docker with one command

    This tutorial provides the foundation for deploying a functional monitoring system. However, several advanced topics were not covered and could be explored in a subsequent part, including:

    • Circuit breakers to prevent cascading failures when a service is flapping
    • Multi-tier escalation to alert managers if the engineer on-call does not respond
    • Alert deduplication to prevent notification storms
    • Adaptive probe intervals to check more frequently during incidents
    • Hot-reload configuration without restarting the service
    • SLA calculations and compliance tracking
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleProtecting Your iMessage Conversations: A Guide to Contact Key Verification
    Next Article AI Enhances Tropical Cyclone Prediction
    Samuel Alejandro

    Related Posts

    Dev

    Docker vs Kubernetes in Production: A Security-First Decision Framework

    February 21, 2026
    Dev

    Effortless VS Code Theming: A Guide to Building Your Own Extension

    February 19, 2026
    Dev

    Implementing Contrast-Color Functionality Using Current CSS Features

    February 19, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Latest Post

    ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

    December 21, 202513 Views

    Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

    December 21, 202511 Views

    Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

    December 21, 202510 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    About

    Welcome to NodeToday, your trusted source for the latest updates in Technology, Artificial Intelligence, and Innovation. We are dedicated to delivering accurate, timely, and insightful content that helps readers stay ahead in a fast-evolving digital world.

    At NodeToday, we cover everything from AI breakthroughs and emerging technologies to product launches, software tools, developer news, and practical guides. Our goal is to simplify complex topics and present them in a clear, engaging, and easy-to-understand way for tech enthusiasts, professionals, and beginners alike.

    Latest Post

    Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

    February 22, 20260 Views

    Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

    February 22, 20260 Views

    Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling

    February 22, 20260 Views
    Recent Posts
    • Anker’s X1 Pro shouldn’t exist, but I’m so glad it does
    • Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations
    • Trump Reinstates De Minimis Exemption Suspension Despite Supreme Court Ruling
    • How Cloudflare Mitigated a Vulnerability in its ACME Validation Logic
    • Demis Hassabis and John Jumper Receive Nobel Prize in Chemistry
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer
    • Cookie Policy
    © 2026 NodeToday.

    Type above and press Enter to search. Press Esc to cancel.