Skip to content

v1.2 Release Notes

Overview

v1.2 adds connection pooling, Prometheus metrics, job cancellation, and audit events. These features improve performance, observability, and operational control for production deployments.


Migration Guide

Breaking changes

None. All v1.1 API calls continue to work unchanged.

New endpoints

  • DELETE /v1/jobs/{job_id} — Cancel a running job
  • GET /metrics — Prometheus metrics endpoint

Response changes

GET /healthcheck — now includes worker count and active job count:

{
  "status": "healthy",
  "version": "1.2.0",
  "uptime_seconds": 3600,
  "components": {
    "redis": { "status": "healthy" },
    "queue": { "status": "healthy", "depth": 0 },
    "workers": { "count": 2, "active_jobs": 1 }
  }
}

When no workers are running, status is "no_workers" (not "degraded").


What's New

Connection pooling

NAAS now maintains persistent SSH connections to frequently-accessed devices. Subsequent commands to the same device execute 3-5x faster by reusing existing connections instead of establishing new SSH sessions.

Connections are automatically managed with configurable timeouts and pool sizes:

Environment variable Default Description
CONNECTION_POOL_ENABLED true Enable connection pooling
CONNECTION_POOL_MAX_SIZE 100 Maximum connections per worker
CONNECTION_POOL_TTL 300 Seconds before idle connection closes

Pool keys include username and password hash, preventing credential sharing between users.

Prometheus metrics

GET /metrics exposes operational metrics in Prometheus format:

  • naas_http_requests_total — Request count by endpoint and status
  • naas_http_request_duration_seconds — Request latency histogram
  • naas_queue_depth — Current job queue depth
  • naas_queue_processing_time_seconds — Job processing time histogram
  • naas_workers_total — Active worker count
  • naas_jobs_total — Job count by status (queued, started, finished, failed)
  • naas_connection_pool_size — Current pool size per worker
  • naas_connection_pool_hits_total — Pool hit count
  • naas_connection_pool_misses_total — Pool miss count

Integrate with Grafana, Datadog, or any Prometheus-compatible monitoring system.

Job cancellation

Cancel long-running or stuck jobs:

curl -X DELETE https://naas.example.com/v1/jobs/{job_id} -u "user:pass"

The worker gracefully terminates the SSH session and marks the job as cancelled. Returns 200 on success, 404 if job not found, 400 if job already finished.

Audit events

Structured audit logging for compliance and troubleshooting. Events are written to stdout as JSON with event_type field:

Event types:

  • job.created — Job enqueued
  • job.started — Worker began processing
  • job.completed — Job finished successfully
  • job.failed — Job failed with error
  • job.cancelled — Job cancelled by user
  • auth.failed — Authentication failure
  • device.connection_failed — Device connection error
  • device.locked_out — Device lockout triggered

Each event includes timestamp, job_id, username, device_ip, and event-specific fields.

Kubernetes deployment

Production-ready Kubernetes manifests with examples:

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/redis/
kubectl apply -f k8s/api/
kubectl apply -f k8s/worker/

Includes:

  • ConfigMap for environment variables
  • Secret for credentials
  • Redis StatefulSet
  • API Deployment with HPA
  • Worker Deployment with liveness probe
  • Services and optional Ingress

Full guide: docs/kubernetes.md


Performance improvements

  • Worker cache for Worker.all() with 10s TTL (reduces Redis scans)
  • File-based heartbeat for worker liveness probe (no Redis overhead)
  • Connection pool reduces SSH handshake overhead by 70-80%

Upgrade Steps

  1. Pull the latest image or update your deployment
  2. Review connection pool settings — defaults are suitable for most deployments
  3. Configure Prometheus scraping for /metrics endpoint
  4. Update healthcheck monitors to handle no_workers status
  5. Optionally configure audit event forwarding to log aggregation system
  6. For Kubernetes deployments, apply manifests from k8s/ directory