Changelog
All notable changes to NAAS will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
NAAS 2.2.0 (2026-05-20)
π Security
- Bump authlib from 1.6.11 to 1.6.12 to address GHSA-9ggr-2464-2j32: open redirect on InvalidScopeError in OpenIDImplicitGrant and OpenIDHybridGrant.
- Bump idna from 3.11 to 3.15 to address GHSA-jjg7-2v4v-x38h: specially-crafted inputs to idna.encode() could bypass the CVE-2024-3651 fix.
- Bump python-multipart from 0.0.26 to 0.0.27 to address GHSA-9mvj-f7w8-pvh2: denial of service via unbounded multipart part headers.
- Bump urllib3 from 2.6.3 to 2.7.0 to address two high-severity CVEs (GHSA-mf9v-mfxr-j63j and GHSA-pq67-6m6q-mj2v): decompression-bomb safeguard bypasses in the streaming API and sensitive headers being forwarded across origins in proxied low-level redirects.
π Bug Fixes
- Fix release-bump invoke task k8s manifest pinning step failing when invoked from packages/naas/ (the documented invocation pattern). The step used a relative path that resolved incorrectly when cwd wasn't the repo root; now uses absolute paths. (#494)
π Documentation
- Backfill missing CHANGELOG.md entries for v1.3.0 and v1.4.0 from the corresponding GitHub Releases. Drops the orphaned reference to #322 (which never actually shipped in v1.4.0). (#477)
- Add ADR 0011 documenting the release process: release/X.Y is the source of truth during a release, CI never commits back to branches, and release ceremony reduces to one invoke task. (#479)
- Update README badge and user-facing documentation links from naas.readthedocs.io/en/latest/ to /en/stable/. The /en/stable/ URL points at the last released version's docs (canonical pattern for OSS Python projects); /en/latest/ tracks the default branch and will start serving in-development docs after the default-branch switch in #489. Internal contributor links (e.g. CONTRIBUTING.md β development guide) keep using /en/develop/. (#489)
- Rewrite the Release Process section in development.md to match the new release-branch-as-truth model (ADR 0011): single inv release-bump command, merge-commit for release PRs, no CI commit-back. Update CONTRIBUTING.md merge-strategy note. Update naas-dev agent prompt with the new ceremony, merge-strategy guidance, inv release-bump usage, and corrected note about internal fragment visibility.
π§ Internal Changes
- Fix release workflow not stopping when the tag already exists. The 'Stop if tag exists' step was a no-op because exit 0 succeeds the step without affecting downstream jobs. Now the should_release job output correctly evaluates to false when the tag exists, preventing duplicate-tag failures and stale changelog rebuilds. (#468)
- Pin DavidAnson/markdownlint-cli2-action to v23.2.0 to prevent silent rule changes from breaking CI. The action's bundled markdownlint engine can introduce new lint rules in minor releases, as happened with MD060 in v23.1.0 which broke the v2.1.0 release PR. (#472)
- Render the docs site changelog page from the repo-root CHANGELOG.md via mkdocs-include-markdown-plugin transclusion, eliminating the duplicated copy at docs/changelog.md and the corresponding cp step in the release workflow. (#478)
- Add inv release-bump VERSION invoke task that performs the entire release ceremony in one command: bump pyproject.toml + uv lock + (final only) towncrier build + k8s manifest pinning + commit + annotated tag + atomic push. See ADR 0011 for design.
- Bump develop to v2.2.0a1 after v2.1.0 release.
- Migrate release pipeline to tag-driven model: release.yml triggers on tag push only and never commits back; finalize-release.yml deleted; sync-release.yml drops the sync-to-release-branch job (no longer needed with merge-commit on release PRs) and skips develop bump for patch releases. Implements ADR 0011.
- Sync v2.1.0 release commits from main into develop after final release.
NAAS 2.1.0 (2026-05-18)
β¨ Features
- Add per-caller and per-caller-per-device sliding window rate limits on submission endpoints. (#86)
- Add OpenTelemetry distributed tracing with trace context propagation through RQ (#284)
- Add SSE streaming endpoint for real-time job status updates. (#404)
- Add
error_codeanderror_retryablefields to job results for machine-parseable error classification. Maps each netmiko/paramiko exception to a structured error code (e.g.CONNECTION_TIMEOUT,AUTH_FAILURE,CONFIG_REJECTED). Client library gains exception subclasses routed by error code. (#430) - Add MCP server package (
packages/naas-mcp) for AI-assisted network operations via FastMCP 3.0. Published asmcp-server-naason PyPI. (#434) - Add remaining MCP tools (send_command_structured, create_api_key, revoke_api_key), naas://jobs resource, and prompt templates (show_commands, config_push, troubleshoot_device).
π Documentation
- Add ADR 0010: MCP Server as Thin Client over REST API. (#365)
- ADR 0009: command authorization deferred to AAA/TACACS+ β NAAS does not filter commands at the proxy layer. (#425)
- Restructure documentation for v2.1: Helm-first Kubernetes page, add K8s/Helm equivalents across all operational docs, create deployment overview page, update v1 API references to v2 (#448)
- Fix MD060 table-column-style violations in deployment, kubernetes, security, and v2.1 release-notes pages. (#461)
- Document rate limiting, OpenTelemetry tracing, SSE streaming, remaining MCP components, and load testing.
- Editorial improvements to home page, quick start, installation, and deployment overview
- Reorganize documentation navigation: split User Guide into Usage and Operations, move Architecture to Reference, collapse ADRs to single index link
- Rewrite security page: remove generic advice, keep NAAS-specific actionable content
- Update architecture page to reflect v2.x features: RBAC, context routing, rate limiting, SSE, OTel, MCP server
π§ͺ Testing & CI/CD
- Add Locust-based load testing with smoke (PR) and full profile (RC tag) CI tiers. (#403)
- Add OTel integration tests with OTLP collector in docker-compose (#417)
- Add SSE streaming endpoint integration tests. (#423)
- Add retry/backoff to healthcheck integration tests to handle worker startup timing. (#437)
- Add RBAC, API key, and v2 field rejection integration tests.
- Add retry/backoff to MCP integration healthcheck test to handle worker startup timing.
π§ Internal Changes
- Sync docs changelog from root CHANGELOG.md on every release (beta, RC, and final)
- Update naas-dev agent prompt with uv run requirements, monorepo structure, correct working directories, and current version info.
NAAS 2.0.0 (2026-04-10)
π₯ Breaking Changes
- Remove deprecated
device_typefield from all endpoints. Useplatforminstead. (#55) - Remove deprecated
ipfield from all endpoints. Usehostinstead. (#272)
π Security
- Bump cryptography from 46.0.6 to 46.0.7.
β οΈ Deprecations
- Add deprecation headers to /v1/ and legacy unversioned route responses. (#358)
β¨ Features
- Add audit events for authentication, authorization, and API key management. (#100)
- Add JWT-based API key authentication with create, validate, and revoke support. (#101)
- Add role-based access control with admin, operator, and viewer roles. (#102)
- Add pluggable secrets backend with environment variable (default) and HashiCorp Vault providers. (#103)
- Add HMAC-SHA256 signing for webhook payloads via optional webhook_secret field. (#277)
- Add exponential backoff retry for failed webhook deliveries. (#278)
- Encrypt device credentials at rest in Redis using Fernet symmetric encryption. (#286)
- Add context-based authorization for API key authenticated requests. (#291)
- Add AWS Secrets Manager as an optional secrets backend. (#347)
- Add API key rotation endpoint (POST /v1/api-keys/{key_id}/rotate). (#348)
- Restore /v1/ backward compatibility (accept ip/device_type, skip RBAC/context auth). (#356)
- Add naas-client package scaffolding as uv workspace member. (#369)
- Add naas-client models and exception hierarchy. (#370)
- Add synchronous NaasClient with full v2 API coverage. (#371)
- Add Job object with polling and wait support. (#372)
- Add async client (AsyncNaasClient) and AsyncJob. (#373)
- Add spectree response annotations to all API endpoints for complete OpenAPI spec. (#377)
- Add CLI tool with Typer, config file support, and healthcheck command. (#387)
- Add send-command and send-config CLI commands with --wait support. (#388)
- Add jobs CLI subcommand group (list, get, cancel, replay, failed). (#389)
- Add contexts and api-keys CLI subcommands. (#390)
- Add Helm chart for Kubernetes deployment. (#401)
- Add Prometheus metrics for RQ worker processes. (#402)
π Bug Fixes
- Fix sync-release workflow failing to bump develop version due to branch protection. (#324)
- Fix release workflow and changelog paths broken by workspace restructure. (#398)
π Documentation
- Add ADR for structured audit event logging design. (#100)
- Add ADR for JWT-based API key authentication design. (#101)
- Add ADR for role-based access control design. (#102)
- Add ADR for secrets backend abstraction design. (#103)
- Add ADR for credential encryption at rest in Redis. (#286)
- Add v1 β v2 API migration guide and ADR for API versioning strategy. (#359)
- Add naas-client README, mypy strict mode, and CI workflow. (#374)
- Add CLI shell completion, --version, and RTD documentation. (#391)
- Update documentation for v2.0 release. (#409)
π§ͺ Testing & CI/CD
- Skip build, test, and k8s CI jobs on docs-only PRs. (#330)
- Add integration tests for /v1/ and /v2/ route coexistence. (#360)
- Add naas-client integration tests against docker-compose NAAS. (#383)
- Use naas-client in NAAS integration and e2e tests. (#384)
- Use Helm chart in k8s integration tests. (#406)
- Add RBAC, API key, and v2 field rejection integration tests.
π§ Internal Changes
- Sync emit_audit_event docstring with current event schemas. (#353)
- Restructure repository to uv workspace with packages/ layout. (#367)
- Bump develop to v2.0.0a1 development cycle.
NAAS 1.4.0 (2026-04-03)
β¨ Features
- Add CONNECTION_POOL_EXCLUDE environment variable to exclude specific device IPs or platforms from connection pooling. (#187)
- Add TTP structured output support to /v1/send_command_structured endpoint via ttp_template parameter. Mutually exclusive with textfsm_template. (#246)
- Add host field accepting IP addresses and hostnames. Deprecate ip field (still works, will be removed in v2.0). (#270)
- Enqueue response now includes queue_position, enqueued_at, and timeout fields. (#271)
- Add webhook_url field to all job submission endpoints. When set, NAAS POSTs a job completion notification (job_id, status, timestamps β never results or credentials) to the URL when the job finishes. (#275)
- Add optional tags field (dict[str,str]) to all enqueue requests. Tags stored in job metadata, returned in job results and list_jobs. Filter jobs by tag with ?tag=key:value. (#276)
- Add MAX_QUEUE_DEPTH config to limit queue depth and return 503 when exceeded. (#279)
- Add X-Idempotency-Key header support. Repeat requests with the same key return the original job_id instead of enqueuing a new job. (#280)
- Add dead letter queue endpoints: GET /v1/jobs/failed lists failed jobs (credentials never exposed), POST /v1/jobs/{job_id}/replay re-enqueues with caller's credentials. Includes FAILED_JOB_MAX_RETAIN config and naas_failed_jobs_total Prometheus gauge. (#281)
- Add job reaper background thread that detects orphaned jobs from dead workers and moves them to FailedJobRegistry. Uses distributed Redis lock to ensure only one reaper runs per cycle. (#282)
- Add global Redis error handler returning 503 with Retry-After header instead of unhandled 500. (#283)
- Add server-side job deduplication. Duplicate in-flight jobs (same host+platform+commands+user) return the existing job_id with deduplicated=true. Dedup keys cleared automatically via RQ callbacks on job completion/failure. (#285)
- Add context-aware job routing for multi-VRF and multi-segment environments. Workers declare contexts via WORKER_CONTEXTS; callers specify context in requests. Includes GET /v1/contexts endpoint and comprehensive documentation. (#290)
- Add conn_timeout field to all job submission endpoints to control TCP connection timeout (default 10s). Useful for fast failure detection on unreachable hosts or tuning for high-latency links. (#304)
π Bug Fixes
- Fix Dockerfile hardcoded python3.11 path to use ARG PYTHON_VERSION, enabling base image upgrades. (#261)
π Documentation
- Add Postman collection and OpenAPI spec as release artifacts. Document API client import instructions for Postman, Insomnia, and Bruno. (#92)
- Establish ADR process using MADR format. Add docs/adr/ directory with template and first ADR for Python client library integration strategy. (#265)
- Add user documentation for job tags and queue backpressure. (#316)
- Fix MD060 table column style violations across documentation files.
π§ͺ Testing & CI/CD
- Add Python 3.12, 3.13, and 3.14 to CI test matrix. Bump Docker default base image to python:3.14-slim. (#263)
- Expand integration tests to cover v1.4 features: host field, deprecated ip field, enqueue response metadata, structured output (TextFSM/TTP), and context routing with multiple workers. (#293)
- Add Docker BuildKit GHA layer caching to integration test builds, reducing rebuild time on cache hits. (#302)
- Run integration tests on Python 3.14 only β the Docker container always uses 3.14 regardless of matrix version, so multi-version integration testing added no coverage value. (#303)
- Reduce k8s CI test time by patching GUNICORN_WORKERS=2 on the API deployment and reducing rollout timeouts from 180s to 90s. (#308)
π§ Internal Changes
- Remove black dependency, superseded by ruff format. (#258)
- Bump dependencies via Dependabot: types-paramiko, GitHub Actions, requests 2.33.0, cryptography 46.0.6, pygments 2.20.0, and 12 minor/patch updates.
- Use RELEASE_TOKEN in finalize-release workflow to trigger CI checks on auto-created PRs
NAAS 1.3.1 (2026-03-06)
π Documentation
- Standardize bash syntax highlighting in kubernetes.md code blocks. (#201)
- Clarify worker concurrency vs connection pooling in kubernetes.md. (#202)
- Add response schema documentation to HealthCheck.get() docstring. (#203)
- Add v1.2 and v1.3 release notes and fix v1.0.0 changelog entry.
NAAS 1.3.0 (2026-03-05)
π₯ Breaking Changes
- Replace delay_factor parameter with read_timeout (float, seconds) in send_command and send_config endpoints (#216)
π Security
- Run API and worker containers as non-root user (UID 1000) with NET_BIND_SERVICE capability for port 443 (#198)
- Set readOnlyRootFilesystem on API and worker containers; pre-compile Python bytecode in Dockerfile (#206)
β¨ Features
- Add /v1/send_command_structured endpoint with TextFSM parsing via ntc-templates or custom templates (#219)
- Add optional expect_string parameter to send_command for custom prompt matching (#220)
- Add platform autodetect via SSHDetect for discovery workflows and heterogeneous environments (#222)
π Bug Fixes
- Detect config errors via
error_patterninsend_config_set, returning error string instead of succeeding silently (#217) - Call find_prompt() after connection pool hit to verify clean CLI state before issuing commands (#218)
- Use
setnxfornaas_cred_saltso API restarts do not invalidate existing connection pool keys and in-flight job auth (#223) - Pass Redis connection explicitly to
tacacs_auth_lockoutanddevice_lockout, eliminating per-request TCP connection overhead (#224) - Set explicit
job_timeouton enqueue to prevent hung workers holding for RQ's 180s default (#226) - Call
redis.ping()at startup to fail fast if Redis is unavailable (#227)
π Documentation
- Add comprehensive documentation for v1.3.0 features: job cancellation, connection pooling, Prometheus metrics, audit events, expect_string parameter, and troubleshooting guides
π§ Internal Changes
- Explicitly set fast_cli=True on ConnectHandler for consistent throughput across all platforms (#221)
- Use Job.fetch_many() in ListJobs to batch-fetch job details in a single Redis pipeline call (#225)
- Pass Job object directly to
job_lockerto avoid redundant Redis fetch after enqueue (#228) - Add comment to
worker_cachedocumenting per-process global state assumption (#229) - Ignore codecov.io badge URLs in link-check to prevent flaky CI failures (#234)
- Share Docker image between build.yml and k8s-tests via artifact to eliminate duplicate builds (#240)
- Automate release workflow: auto-create PR from main to develop, auto-bump develop to next alpha version after release
- Improve code quality: add docstrings, avoid mutable dict mutations, replace bare except with specific exceptions, eliminate type ignore comments
- Replace shell scripts with Python for changelog cleanup, add comprehensive unit tests, fix bugs in cleanup_released_fragments and cleanup_changelog scripts
NAAS 1.2.0 (2026-03-02)
β¨ Features
- Add connection pooling for persistent SSH device connections, reducing VTY session overhead on network devices (#57)
- Add Prometheus metrics endpoint at
/metricsexposing request counts, latency, queue depth, worker count, and job totals (#78) - Add DELETE endpoint for job cancellation (#178)
- Healthcheck endpoint now reports worker count, active jobs, and a
no_workersstatus when no RQ workers are available (#179) - Add structured audit events for job lifecycle and device failures (#180)
π Bug Fixes
- Fix connection pool key to include password hash, preventing credential sharing between users with the same username (#196)
π Documentation
- Add Kubernetes deployment manifests and documentation for k3d/k8s deployment (#28)
- Document connection pool configuration variables in kubernetes.md and k8s/configmap.yaml (#191)
- Document TLS certificate encoding requirements for NAAS_CERT/NAAS_KEY/NAAS_CA_BUNDLE in kubernetes.md (#192)
- Add 400 status code to CancelJob.delete() docstring (#193)
- Document required fields per event type in emit_audit_event docstring (#194)
- Add exposed metrics list to Monitoring section in kubernetes.md (#195)
π§ͺ Testing & CI/CD
- Add end-to-end job test to k8s CI: deploys Cisshgo into k3d cluster and validates full API β Redis β worker β SSH device job flow (#189)
- Justify pragma: no cover on cancel_job.py auth guard with inline explanation (#205)
- Fix contract test to accept no_workers healthcheck status
π§ Internal Changes
- Release workflow now pins k8s manifest image tags to the release version (#197)
- Cache Worker.all() result with 10s TTL to avoid repeated Redis scans on every request (#199)
- Add file-based heartbeat to worker process and liveness probe to k8s worker deployment (#200)
NAAS 1.1.0 (2026-02-26)
β¨ Features
- Device IP lockout: after 10 connection failures within 10 minutes, all access to that device is blocked for 10 minutes, preventing credential-spray abuse across multiple users. Also refactors the user lockout to use the same Redis sorted-set sliding-window implementation. (#2)
- All log output is now structured JSON, making logs parseable by ELK, Splunk, CloudWatch, and other log aggregation tools. Each log line includes
timestamp,level,logger, andmessagefields. (#79) - Correlation ID (request_id/job_id) now flows from API request through to worker log messages, enabling end-to-end log tracing across API and worker. (#80)
- Health check endpoint now returns detailed component status including Redis connectivity, queue depth, version, and uptime. Returns
status: degradedwhen Redis is unreachable. (#81) - OpenAPI spec now includes Basic Auth security scheme, enabling 'Try it out' authentication in Swagger UI at /apidoc. (#83)
- All API endpoints are now available under the
/v1/prefix. Legacy unversioned routes (/send_command,/send_config) remain functional but returnX-API-Deprecated: trueandX-API-Sunset: 2027-01-01headers. All responses includeX-API-Version: v1. (#84) - Add structured request validation using Pydantic: IP address format, platform against all supported Netmiko device types, and non-empty command lists. Invalid requests now return 422 with a structured
errorsarray instead of a generic 400. (#85) - Add pagination for job history with GET /v1/jobs endpoint (#87)
- Workers now handle SIGTERM gracefully, finishing in-flight jobs before shutdown with configurable timeout (#94)
- Job result TTLs are now configurable via environment variables:
JOB_TTL_SUCCESS(default 24h) andJOB_TTL_FAILED(default 7 days). Previously both were hardcoded at ~24h. (#95) - Circuit breaker pattern prevents repeated connection attempts to failing devices with configurable thresholds and automatic recovery (#96)
- Add dynamic OpenAPI spec generation via spectree. The spec is served at
/apidoc/openapi.jsonand Swagger UI at/apidoc, generated automatically from existing Pydantic request/response models. (#120) - Add Pydantic validation for
/v1/jobsquery parameters (page,per_page,status). Invalid values now return 422 instead of being silently clamped. (#122)
π Bug Fixes
- Fix auth guard in get_results.py to use explicit raise instead of assert (assert is stripped by python -O) (#159)
- Fix list_jobs unfiltered path to paginate per-registry instead of fetching all job IDs into memory; fix total_count to use registry.count rather than len of fetched IDs (#160)
- Fix bare except Exception in healthcheck.py to catch redis.exceptions.RedisError specifically (#163)
- Fix module-level Redis client in netmiko_lib.py initialised at import time; now lazily initialised on first use in circuit_breaker.py (#164)
- Failed jobs now include error detail in the API response. Previously
GET /v1/send_command/{job_id}returned no error information when a job had statusfailed. - Fix job.get_id() β job.id (rq removed get_id() in a recent release); was causing 500 errors on all job submissions.
π Documentation
- Add MkDocs documentation site with Material theme and Read the Docs configuration (#76)
- Fix command examples in CONTRIBUTING.md, docs/testing.md, and docs/COVERAGE.md to use
uv runprefix for plug-and-play usage without virtualenv activation (#124) - Add v1.1 release notes with migration guide. Update API usage docs with GET /v1/jobs, X-Request-ID tracing, device_type deprecation notice, and 422 error shapes. Fix stale healthcheck response examples. Add Observability, Reliability, and Environment Variables reference pages. (#135)
- Fix security.md to remove phantom TLS_MIN_VERSION and TLS_CIPHERS environment variables. Document actual TLS behavior: hardcoded secure defaults (TLS 1.2+, HIGH cipher suite) configured in Gunicorn. (#145)
- Add architecture overview page with Mermaid diagrams covering system components, request lifecycle, async model rationale, and horizontal scaling. (#146)
- Split CONTRIBUTING.md into a short contributor entry point and a full Development Guide reference page. (#147)
- Configure Read the Docs to build the develop branch, making bleeding-edge documentation available at naas.readthedocs.io/en/develop/. (#150)
- Document requirement that all merges to main must go through pull requests
- Fix license section to reference existing MIT license file
- Streamline README for clarity and conciseness
- Update README with Read the Docs links
π§ͺ Testing & CI/CD
- Add cisshgo mock SSH device container to integration test suite. Tests now cover full APIβworkerβSSHβdevice path including happy path, auth failure, device lockout, circuit breaker, and error handling scenarios. (#74)
π§ Internal Changes
- Automate cleanup of released changelog fragments from develop (#107)
- Add invoke export-spec task and CI check to keep docs/swagger/openapi.json in sync with code; remove stale docs/swagger/naas.yaml (#128)
- Audit and fix all lint/type-checking exemptions: add types-paramiko stubs, fix hset int->str args, remove dead auth guard in get_results, add inline justification for all remaining ignores. (#154)
- Extract circuit breaker infrastructure into naas/library/circuit_breaker.py; deduplicate circuit breaker wrapper in netmiko_lib.py; lazy-init Redis client to prevent import-time failure when Redis is unavailable (#161)
- Remove dead validation methods from Validate class (has_port, is_ip_addr, save_config, commit, is_command_set, has_platform, has_delay_factor) β Pydantic models in models.py handle all request validation now (#162)
- Move mypy into the enforced lint job so type errors block CI. Previously mypy ran with continue-on-error=true in a separate job and failures were silently ignored.
NAAS 1.0.1 (2026-02-24)
β¨ Features
- Add non-blocking Vale prose linting for changelog fragments. (#73)
π Bug Fixes
- Fix changelog cleanup to remove old pre-releases. (#72)
π Documentation
- Restructure README for user-first information architecture with deployment instructions prioritized over development content (#104)
- Restructure README for user-first information architecture (#106)
π§ Internal Changes
NAAS 1.0.0 (2026-02-23)
β οΈ Deprecations
- Rename
device_typeparameter toplatformto match Netmiko naming convention (backward compatibility maintained in v1.x) (#25)
β¨ Features
- Migrate from Docker Swarm to Docker Compose for simpler deployment and better developer experience (#29)
π Documentation
- Add comprehensive user documentation including quick start guide, API usage examples, troubleshooting guide, and security best practices (#3)
π§ͺ Testing & CI/CD
- Implement comprehensive CI/CD pipeline with GitHub Actions including automated testing, linting, and Docker builds (#34)
- Achieve 100% test coverage with comprehensive unit, integration, and contract tests (#47)