NAAS v2.1 Release Notes

NAAS v2.1 adds OpenTelemetry tracing, an MCP server for AI-assisted operations, SSE streaming for real-time job updates, structured error codes, and rate limiting.

Highlights

OpenTelemetry distributed tracing — End-to-end traces from API request through RQ queue to SSH device operation
MCP server — AI assistants can manage network devices through the Model Context Protocol (mcp-server-naas on PyPI)
SSE streaming — Real-time job status updates via Server-Sent Events
Structured error codes — Machine-parseable error classification (CONNECTION_TIMEOUT, AUTH_FAILURE, CONFIG_REJECTED, etc.)
Rate limiting — Per-caller and per-caller-per-device sliding window limits on submission endpoints

New Features

OpenTelemetry Distributed Tracing

Traces follow the full request lifecycle: API → RQ queue → worker → SSH device. Trace context propagates through the job queue via W3C traceparent, linking API and worker spans into a single trace.

# Enable on both API and worker
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

Install the optional dependency: pip install naas[otel]. When disabled (default), tracing adds zero overhead.

See Observability and ADR 0008.

MCP Server

The mcp-server-naas package exposes NAAS operations to AI assistants (Claude, Copilot, etc.) via the Model Context Protocol. Tools include send_command, send_config, send_command_structured, create_api_key, revoke_api_key, and job management. Includes prompt templates for common workflows.

pip install mcp-server-naas

See MCP Server and ADR 0010.

SSE Streaming

Subscribe to real-time job status updates instead of polling:

curl -N https://naas.example.com/v2/send-command/<job_id>/stream \
  -H "Authorization: Bearer eyJ..."

Events are delivered as the job transitions through queued → started → finished/failed.

Structured Error Codes

Job results now include error_code and error_retryable fields when a job fails:

{
  "status": "failed",
  "error": "Connection to 10.0.0.1 timed out after 30 seconds",
  "error_code": "CONNECTION_TIMEOUT",
  "error_retryable": true
}

Error codes map each netmiko/paramiko exception to a structured identifier. The Python client library gains exception subclasses routed by error code.

See Error Codes.

Rate Limiting

Per-caller and per-caller-per-device sliding window rate limits protect against abuse and accidental flooding:

Variable	Default	Description
`RATE_LIMIT_ENABLED`	`true`	Enable/disable rate limiting
`RATE_LIMIT_PER_CALLER`	`1000`	Max requests per caller per window
`RATE_LIMIT_PER_CALLER_DEVICE`	`20`	Max requests per caller per device per window
`RATE_LIMIT_WINDOW`	`60`	Sliding window size in seconds

Returns 429 Too Many Requests with Retry-After header when exceeded. Admin role is exempt by default.

See Security — Rate Limiting.

Documentation

Kubernetes deployment page restructured as Helm-first
K8s/Helm equivalents added across troubleshooting, security, observability, and architecture docs
Deployment overview page with method decision guide
All v1 API references updated to v2
ADR 0009: Command authorization deferred to AAA
ADR 0010: MCP server as thin client over REST API

Testing

Locust-based load testing with smoke (PR) and full (RC) CI profiles
OTel integration tests with OTLP collector
SSE streaming integration tests
RBAC, API key, and v2 field rejection integration tests

Compatibility

Python 3.11+
Netmiko 4.x
Redis 7+
Kubernetes 1.27+ (Helm chart)