Troubleshooting guide
Common issues and solutions for NAAS deployment and operation.
Contents
- Connection Issues
- Authentication Problems
- Redis Issues
- Worker Problems
- SSL/TLS Issues
- Performance Issues
- Debugging
Connection Issues
Cannot connect to NAAS
Symptom: curl: (7) Failed to connect to localhost port 8443
Solutions:
-
Check if services are running:
-
Verify port/service exposure:
-
Check firewall rules:
NAAS Cannot Reach Network Devices
Symptom: Jobs fail with "Connection timeout" or "No route to host"
Solutions:
- Verify network connectivity from NAAS host:
-
Check connectivity from inside the container/pod:
-
If using custom networks, ensure proper routing:
Authentication Problems
401 Unauthorized
Symptom: {"message": "Unauthorized"}
Solutions:
- Verify credentials are provided:
# Correct
curl -k -u "username:password" https://localhost:8443/v2/send-command
# Wrong - missing credentials
curl -k https://localhost:8443/v2/send-command
- Check for special characters in password:
# URL encode special characters
curl -k -u "username:p@ssw0rd!" https://localhost:8443/v2/send-command
- Verify credentials work directly on device:
Authentication Failed on Device
Symptom: Job fails with "Authentication failed"
Solutions:
- Verify credentials are correct for the device
- Check if account is locked on device
- Verify SSH is enabled on device
- Check device access lists/restrictions
- Try with enable password if required:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"enable": "enable_password",
"commands": ["show version"]
}
Redis Issues
Redis Connection Failed
Symptom: {"status": "degraded", "components": {"redis": {"status": "unhealthy"}, ...}}
Solutions:
-
Check Redis is running:
-
Verify Redis password:
-
Check connectivity from API to Redis:
Redis Out of Memory
Symptom: Jobs fail to queue, Redis logs show OOM errors
Solutions:
-
Check Redis memory usage:
-
Increase Redis memory limit:
-
Clear old job data:
Worker Problems
No Workers Available
Symptom: Jobs stay in "queued" status indefinitely
Solutions:
-
Check workers are running:
-
Scale up workers:
Workers Crashing
Symptom: Worker containers/pods restart frequently
Solutions:
-
Check worker logs:
-
Increase worker resources:
SSL/TLS Issues
SSL Certificate Errors
Symptom: SSL certificate problem: self signed certificate
Solutions:
- For testing, disable SSL verification:
-
For production, use valid certificates:
-
Add CA certificate to trust store:
# Linux
sudo cp naas-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
# macOS
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain naas-ca.crt
Certificate expired
Symptom: SSL certificate problem: certificate has expired
Solutions:
-
Regenerate self-signed certificate (restart the service):
-
Or provide your own:
Performance Issues
Slow Job Execution
Symptom: Jobs take longer than expected
Solutions:
- Check device response time:
- Increase delay factor for slow devices:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"read_timeout": 60.0,
"commands": ["show version"]
}
Note: Prior to v1.3, this parameter was delay_factor (integer multiplier).
Migrate by converting: delay_factor=2 → read_timeout=60.0 (approximate).
Commands That Don't Return to Standard Prompt
Symptom: Jobs hang or timeout on commands like ping, traceroute, or interactive prompts
Solution: Use expect_string to match the expected output pattern instead of relying on prompt detection:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"expect_string": "Success rate",
"commands": ["ping 8.8.8.8"]
}
The expect_string is a regex matched against device output. Useful for commands that
don't return to a standard prompt or have interactive elements.
Unknown or Heterogeneous Device Types
Symptom: Managing devices with unknown or mixed platforms
Solution: Use platform: "autodetect" to fingerprint devices via SSHDetect:
The detected platform is returned in the job result as detected_platform. Note:
- Adds a second SSH connection overhead (fingerprinting + actual commands)
- Not compatible with connection pooling
-
Best for discovery workflows, not production automation against known devices
-
Scale workers for parallel execution:
High Memory Usage
Symptom: Containers/pods using excessive memory
Solutions:
-
Check memory usage:
-
Set memory limits:
Debugging
Enable Debug Logging
Check Job Details in Redis
Common Redis commands:
# List all jobs
KEYS rq:job:*
# Get job details
HGETALL rq:job:550e8400-e29b-41d4-a716-446655440000
# Check queue length
LLEN rq:queue:default
Container/Pod Shell Access
Getting help
If you're still experiencing issues:
- Check GitHub Issues for similar problems
- Review API documentation
- Open a new issue with:
- NAAS version
- Deployment method and version (Docker Compose, Helm chart, Kubernetes version)
- Error messages and logs
- Steps to reproduce
Next steps
- Security Best Practices - Secure your deployment
- API Usage Examples - Learn the API
- Quick Start Guide - Get started with NAAS
Connection Pooling Issues
Stale Connections
Symptom: Commands fail with "Connection closed" or timeout errors after device reboot.
Cause: Pooled connection became stale when device rebooted.
Solution: NAAS automatically detects and reconnects. If issues persist, disable pooling:
High Memory Usage
Symptom: Worker containers consuming excessive memory.
Cause: Too many pooled connections.
Solution: Reduce pool size:
Structured Output Issues
No Template Found
Symptom: /v2/send-command-structured returns raw string instead of list[dict].
Cause: No TextFSM template exists for the (platform, command) combination.
Solution: Supply a custom template:
{
"host": "192.168.1.1",
"platform": "cisco_ios",
"commands": ["show custom"],
"textfsm_template": "Value FIELD (\\S+)\\n\\nStart\\n ^${FIELD} -> Record"
}
Or check ntc-templates for available templates.
Parsing Errors
Symptom: Structured output returns empty list or missing fields.
Cause: Template doesn't match actual device output format.
Solution: Test template with TextFSM online tool or supply corrected custom template.
Platform Autodetect Issues
Autodetect Fails
Symptom: platform: "autodetect" returns error "Platform autodetect failed".
Cause: Device doesn't respond to SSHDetect probes, or connection fails.
Solution: Use explicit platform instead:
Wrong Platform Detected
Symptom: Autodetect returns incorrect platform type.
Cause: Device responds ambiguously to SSHDetect probes.
Solution: Use explicit platform. Autodetect is best-effort and not 100% accurate.