v1.4 Release Notes
Overview
v1.4 is a major feature release focused on operational reliability, observability, and developer experience. It adds job deduplication, webhooks, a dead letter queue, context-aware routing, queue backpressure, and significant CI improvements.
Migration Guide
No breaking changes
v1.4 is fully backward-compatible with v1.3. All new fields are optional with sensible defaults.
Deprecations
ipfield — deprecated in favor ofhost(accepts IP addresses, IPv6, and hostnames). Theipfield still works but will be removed in v2.0.
What's New
Hostname support (host field)
The host field now accepts IP addresses, IPv6 addresses, and RFC 1123 hostnames:
The ip field is deprecated — migrate to host at your convenience.
Context-aware job routing
Route jobs to specific worker pools using the context field. Useful for multi-VRF, multi-segment, or geographically distributed environments:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"commands": ["show version"],
"context": "oob-dc1"
}
Workers declare their context via WORKER_CONTEXTS=oob-dc1. Jobs are routed to the matching worker pool. Use GET /v1/contexts to list available contexts.
Full guide: docs/contexts.md
Job deduplication
Duplicate in-flight jobs (same host + platform + commands + user) return the existing job_id instead of enqueuing a new job:
Enabled by default. Disable with JOB_DEDUP_ENABLED=false.
Idempotency keys
Client-controlled deduplication for safe retries on network failure:
Repeat requests with the same key within 24 hours return the original job_id with idempotent: true.
Webhooks
Receive a notification when a job completes instead of polling:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"commands": ["show version"],
"webhook_url": "https://my-app.example.com/naas-callback"
}
NAAS POSTs {"job_id": "...", "status": "finished", "enqueued_at": "...", "completed_at": "..."} to your URL. Results and credentials are never included. HTTPS only.
Dead letter queue
Inspect and replay failed jobs:
# List failed jobs
GET /v1/jobs/failed
# Replay a failed job with your current credentials
POST /v1/jobs/{job_id}/replay
Credentials are never exposed in the response. Error messages have credential values redacted. Cap registry size with FAILED_JOB_MAX_RETAIN (default 500).
Job tags
Attach metadata to jobs for filtering and auditing:
{
"host": "192.168.1.1",
"commands": ["show version"],
"tags": {"team": "network-ops", "env": "prod", "ticket": "CHG-12345"}
}
Filter jobs by tag: GET /v1/jobs?tag=team:network-ops
Queue backpressure
Prevent queue overload with MAX_QUEUE_DEPTH. Returns 503 Service Unavailable when the queue is full:
Enqueue response metadata
All enqueue responses now include:
{
"job_id": "abc-123",
"queue_position": 3,
"enqueued_at": "2026-03-20T19:00:00+00:00",
"timeout": 60
}
TTP structured output
/v1/send_command_structured now supports TTP templates in addition to TextFSM:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"commands": ["show version"],
"ttp_template": "hostname {{ hostname }}"
}
ttp_template and textfsm_template are mutually exclusive.
Connection timeout control
Control TCP connection timeout with conn_timeout (default 10s):
Useful for fast failure detection on unreachable hosts or tuning for high-latency links.
Reliability improvements
Redis graceful degradation
Redis errors now return 503 Service Unavailable with Retry-After: 10 instead of unhandled 500 errors.
Job reaper
Background thread in each worker detects orphaned jobs from dead workers (OOM kills, node failures) and moves them to the failed registry. Clears dedup keys so jobs can be re-submitted. Uses a distributed Redis lock to ensure only one reaper runs per cycle.
Configure with JOB_REAPER_ENABLED, JOB_REAPER_INTERVAL, WORKER_STALE_THRESHOLD.
Connection pool exclusion list
Exclude specific devices or platforms from connection pooling:
Observability
naas_failed_jobs_totalPrometheus gauge — tracks dead letter queue depthfailed_jobscount in/healthcheckresponsejob.orphanedaudit event when reaper moves a job to failed registry
Developer experience
- ADR process —
docs/adr/with MADR format for architectural decisions - Postman/OpenAPI artifacts — OpenAPI spec published as release artifact
- Python 3.12/3.13/3.14 support — all tested in CI matrix
- Docker default bumped to Python 3.14
CI improvements
- Docker BuildKit GHA layer caching for integration tests (~30-40s savings per run)
- Integration tests reduced from 4 parallel Python-version jobs to 1 (75% compute reduction — container always runs 3.14)
GUNICORN_WORKERS=2in CI environments for faster startupdocker compose up --waitfor reliable service readiness
New configuration variables
| Variable | Default | Description |
|---|---|---|
NAAS_CONTEXTS |
default |
Comma-separated list of valid routing contexts |
WORKER_CONTEXTS |
default |
Contexts this worker handles |
JOB_DEDUP_ENABLED |
true |
Enable server-side job deduplication |
IDEMPOTENCY_TTL |
86400 |
Idempotency key TTL in seconds |
MAX_QUEUE_DEPTH |
(unlimited) | Maximum queue depth before 503 |
WEBHOOK_ALLOW_HTTP |
false |
Allow HTTP webhook URLs (testing only) |
FAILED_JOB_MAX_RETAIN |
500 |
Maximum failed jobs in dead letter queue |
JOB_REAPER_ENABLED |
true |
Enable orphaned job detection |
JOB_REAPER_INTERVAL |
60 |
Seconds between reaper scans |
WORKER_STALE_THRESHOLD |
120 |
Seconds before worker considered dead |
GUNICORN_WORKERS |
8 |
Number of gunicorn worker processes |
CONNECTION_POOL_EXCLUDE |
(none) | Comma-separated IPs/platforms to exclude from pooling |
Upgrade Steps
- Pull the latest image or update your deployment
- Update
ipfield tohostin your requests (optional —ipstill works) - Configure
NAAS_CONTEXTSandWORKER_CONTEXTSif using multi-segment routing - Review
MAX_QUEUE_DEPTHfor your workload - Configure webhook endpoints if you want push notifications instead of polling
- Review Prometheus dashboards — new
naas_failed_jobs_totalgauge available