Architecture
Overview
NAAS is an async wrapper around Netmiko. The API accepts requests, enqueues jobs, and returns immediately. Workers pick up jobs from the queue and execute them against network devices over SSH.
sequenceDiagram
participant Client
participant API as NAAS API
participant Queue as Redis
participant Worker as RQ Worker
participant Device as Network Device
Client->>API: POST /v2/send-command
API->>API: Auth (Basic/JWT), RBAC, rate limit
API->>Queue: enqueue job (context queue)
API-->>Client: 202 Accepted { job_id }
Worker->>Queue: fetch from context queue
Worker->>Device: SSH via Netmiko (pooled connection)
Device-->>Worker: command output
Worker->>Queue: store result
Client->>API: GET /v2/send-command/{job_id}
API-->>Client: 200 { status: finished, results: {...} }
Clients can also subscribe to GET /v2/send-command/{job_id}/stream for real-time SSE updates instead of polling.
Components
graph TD
Client["Client<br/>(Python SDK / CLI / curl / MCP)"]
API["NAAS API<br/>(Flask + Gunicorn)"]
Redis["Redis<br/>(queues, results, state)"]
Worker["RQ Worker<br/>(one or more)"]
Device["Network Devices"]
Metrics["Prometheus"]
OTel["OTel Collector<br/>(optional)"]
Client -->|HTTPS| API
API -->|enqueue / fetch| Redis
Worker -->|dequeue / store| Redis
Worker -->|SSH via Netmiko| Device
Metrics -->|scrape /metrics| API
Metrics -->|scrape /metrics| Worker
API -.->|traces| OTel
Worker -.->|traces| OTel
NAAS API
Flask application handling:
- Authentication (HTTP Basic Auth or JWT Bearer tokens)
- RBAC enforcement (admin/operator/viewer)
- Rate limiting (per-caller and per-caller-per-device)
- Request validation and job deduplication
- Context-based queue routing
- Job enqueueing and result retrieval
- SSE streaming for real-time job updates
- Prometheus metrics at
/metrics - Structured audit event emission
Stateless — all state lives in Redis. Run multiple instances behind a load balancer.
Redis
Redis is the central coordination point. It stores:
- Job queues — One RQ queue per context (
naas-default,naas-corp,naas-oob-dc1, etc.) - Job results — Completed output with configurable TTL
- Circuit breaker state — Per-device failure counts, shared across workers
- Connection pool metadata — Tracks pooled SSH connections per worker
- Rate limit counters — Sliding window sorted sets per caller
- API keys — JWT key metadata and revocation set
- Encrypted credentials — Device credentials encrypted at rest (when stored for pooled connections)
- Idempotency keys — Deduplication state for repeated submissions
For production, use a managed Redis with replication and persistence. The bundled Redis is single-replica with no persistence.
RQ Workers
Workers are separate processes that dequeue jobs and execute them. Each worker process handles one job at a time.
- Serve one or more contexts (configured via
WORKER_CONTEXTS) - Maintain persistent SSH connection pool (reuse connections across sequential jobs to the same device)
- Share circuit breaker state across all workers via Redis
- Emit Prometheus metrics and structured audit events
- Propagate OpenTelemetry trace context from the API through the queue
Scale horizontally: total job concurrency = number of worker processes across all pods.
Network Devices
NAAS connects to devices over SSH using Netmiko. Credentials from the HTTP request (Basic Auth) are passed directly to the device — NAAS does not maintain a separate credential store.
MCP Server (Optional)
The mcp-server-naas package exposes NAAS operations to AI assistants via the Model Context Protocol. It's a thin client that calls the REST API — no direct Redis or device access.
Request Lifecycle
- Client sends
POST /v2/send-commandwith host, platform, commands, and optional context - API authenticates (Basic Auth or JWT), checks RBAC role, enforces rate limits
- API validates the request, checks idempotency key and device lockout
- API enqueues the job to the context-specific queue (e.g.
naas-oob-dc1) - API returns
202 Acceptedwithjob_id - Worker serving that context picks up the job, checks the circuit breaker
- Worker connects to the device (reusing a pooled connection if available), runs commands, stores the result
- Client retrieves results via
GET /v2/send-command/{job_id}or SSE stream
Why Async?
SSH connections to network devices take seconds to minutes depending on device responsiveness and command complexity. A synchronous API would hold HTTP connections open for the duration, limiting throughput and causing timeouts.
The async model lets the API return immediately. Clients poll or subscribe to SSE at their own pace. It also enables horizontal scaling — add workers to increase throughput without changing the API layer.
Scaling
graph LR
LB["Load Balancer"]
API1["API instance 1"]
API2["API instance 2"]
Redis["Redis"]
W1["Worker (corp)"]
W2["Worker (oob)"]
W3["Worker (default)"]
LB --> API1
LB --> API2
API1 --> Redis
API2 --> Redis
W1 --> Redis
W2 --> Redis
W3 --> Redis
- API — Stateless, scales horizontally. Any instance handles any request.
- Workers — Scale per context. Each worker serves one or more contexts. Add replicas to increase concurrency for a given context.
- Redis — Single coordination point. Use Redis Sentinel or Cluster for HA.