v1.2 Release Notes
Overview
v1.2 adds connection pooling, Prometheus metrics, job cancellation, and audit events. These features improve performance, observability, and operational control for production deployments.
Migration Guide
Breaking changes
None. All v1.1 API calls continue to work unchanged.
New endpoints
DELETE /v1/jobs/{job_id}— Cancel a running jobGET /metrics— Prometheus metrics endpoint
Response changes
GET /healthcheck — now includes worker count and active job count:
{
"status": "healthy",
"version": "1.2.0",
"uptime_seconds": 3600,
"components": {
"redis": { "status": "healthy" },
"queue": { "status": "healthy", "depth": 0 },
"workers": { "count": 2, "active_jobs": 1 }
}
}
When no workers are running, status is "no_workers" (not "degraded").
What's New
Connection pooling
NAAS now maintains persistent SSH connections to frequently-accessed devices. Subsequent commands to the same device execute 3-5x faster by reusing existing connections instead of establishing new SSH sessions.
Connections are automatically managed with configurable timeouts and pool sizes:
| Environment variable | Default | Description |
|---|---|---|
CONNECTION_POOL_ENABLED |
true |
Enable connection pooling |
CONNECTION_POOL_MAX_SIZE |
100 |
Maximum connections per worker |
CONNECTION_POOL_TTL |
300 |
Seconds before idle connection closes |
Pool keys include username and password hash, preventing credential sharing between users.
Prometheus metrics
GET /metrics exposes operational metrics in Prometheus format:
naas_http_requests_total— Request count by endpoint and statusnaas_http_request_duration_seconds— Request latency histogramnaas_queue_depth— Current job queue depthnaas_queue_processing_time_seconds— Job processing time histogramnaas_workers_total— Active worker countnaas_jobs_total— Job count by status (queued, started, finished, failed)naas_connection_pool_size— Current pool size per workernaas_connection_pool_hits_total— Pool hit countnaas_connection_pool_misses_total— Pool miss count
Integrate with Grafana, Datadog, or any Prometheus-compatible monitoring system.
Job cancellation
Cancel long-running or stuck jobs:
The worker gracefully terminates the SSH session and marks the job as cancelled. Returns 200 on success, 404 if job not found, 400 if job already finished.
Audit events
Structured audit logging for compliance and troubleshooting. Events are written to stdout as JSON with event_type field:
Event types:
job.created— Job enqueuedjob.started— Worker began processingjob.completed— Job finished successfullyjob.failed— Job failed with errorjob.cancelled— Job cancelled by userauth.failed— Authentication failuredevice.connection_failed— Device connection errordevice.locked_out— Device lockout triggered
Each event includes timestamp, job_id, username, device_ip, and event-specific fields.
Kubernetes deployment
Production-ready Kubernetes manifests with examples:
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/redis/
kubectl apply -f k8s/api/
kubectl apply -f k8s/worker/
Includes:
- ConfigMap for environment variables
- Secret for credentials
- Redis StatefulSet
- API Deployment with HPA
- Worker Deployment with liveness probe
- Services and optional Ingress
Full guide: docs/kubernetes.md
Performance improvements
- Worker cache for
Worker.all()with 10s TTL (reduces Redis scans) - File-based heartbeat for worker liveness probe (no Redis overhead)
- Connection pool reduces SSH handshake overhead by 70-80%
Upgrade Steps
- Pull the latest image or update your deployment
- Review connection pool settings — defaults are suitable for most deployments
- Configure Prometheus scraping for
/metricsendpoint - Update healthcheck monitors to handle
no_workersstatus - Optionally configure audit event forwarding to log aggregation system
- For Kubernetes deployments, apply manifests from
k8s/directory