v1.2 Release Notes

Overview

v1.2 adds connection pooling, Prometheus metrics, job cancellation, and audit events. These features improve performance, observability, and operational control for production deployments.

Migration Guide

Breaking changes

None. All v1.1 API calls continue to work unchanged.

New endpoints

DELETE /v1/jobs/{job_id} — Cancel a running job
GET /metrics — Prometheus metrics endpoint

Response changes

GET /healthcheck — now includes worker count and active job count:

{
  "status": "healthy",
  "version": "1.2.0",
  "uptime_seconds": 3600,
  "components": {
    "redis": { "status": "healthy" },
    "queue": { "status": "healthy", "depth": 0 },
    "workers": { "count": 2, "active_jobs": 1 }
  }
}

When no workers are running, status is "no_workers" (not "degraded").

What's New

Connection pooling

NAAS now maintains persistent SSH connections to frequently-accessed devices. Subsequent commands to the same device execute 3-5x faster by reusing existing connections instead of establishing new SSH sessions.

Connections are automatically managed with configurable timeouts and pool sizes:

Environment variable	Default	Description
`CONNECTION_POOL_ENABLED`	`true`	Enable connection pooling
`CONNECTION_POOL_MAX_SIZE`	`100`	Maximum connections per worker
`CONNECTION_POOL_TTL`	`300`	Seconds before idle connection closes

Pool keys include username and password hash, preventing credential sharing between users.

Prometheus metrics

GET /metrics exposes operational metrics in Prometheus format:

naas_http_requests_total — Request count by endpoint and status
naas_http_request_duration_seconds — Request latency histogram
naas_queue_depth — Current job queue depth
naas_queue_processing_time_seconds — Job processing time histogram
naas_workers_total — Active worker count
naas_jobs_total — Job count by status (queued, started, finished, failed)
naas_connection_pool_size — Current pool size per worker
naas_connection_pool_hits_total — Pool hit count
naas_connection_pool_misses_total — Pool miss count

Integrate with Grafana, Datadog, or any Prometheus-compatible monitoring system.

Job cancellation

Cancel long-running or stuck jobs:

curl -X DELETE https://naas.example.com/v1/jobs/{job_id} -u "user:pass"

The worker gracefully terminates the SSH session and marks the job as cancelled. Returns 200 on success, 404 if job not found, 400 if job already finished.

Audit events

Structured audit logging for compliance and troubleshooting. Events are written to stdout as JSON with event_type field:

Event types:

job.created — Job enqueued
job.started — Worker began processing
job.completed — Job finished successfully
job.failed — Job failed with error
job.cancelled — Job cancelled by user
auth.failed — Authentication failure
device.connection_failed — Device connection error
device.locked_out — Device lockout triggered

Each event includes timestamp, job_id, username, device_ip, and event-specific fields.

Kubernetes deployment

Production-ready Kubernetes manifests with examples:

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/redis/
kubectl apply -f k8s/api/
kubectl apply -f k8s/worker/

Includes:

ConfigMap for environment variables
Secret for credentials
Redis StatefulSet
API Deployment with HPA
Worker Deployment with liveness probe
Services and optional Ingress

Full guide: docs/kubernetes.md

Performance improvements

Worker cache for Worker.all() with 10s TTL (reduces Redis scans)
File-based heartbeat for worker liveness probe (no Redis overhead)
Connection pool reduces SSH handshake overhead by 70-80%

Upgrade Steps

Pull the latest image or update your deployment
Review connection pool settings — defaults are suitable for most deployments
Configure Prometheus scraping for /metrics endpoint
Update healthcheck monitors to handle no_workers status
Optionally configure audit event forwarding to log aggregation system
For Kubernetes deployments, apply manifests from k8s/ directory