Network Telemetry vs. SNMP
The Operational Divide in Modern Network Observability
The SNMP Architecture: A Polling-Based Legacy
SNMP was designed in 1988 (RFC 1067) when networks were small, devices were few, and management stations were powerful enough to query everything. Its architecture is fundamentally a request/response (pull) model: the Network Management System (NMS) periodically polls each device, requesting specific OIDs (Object Identifiers) from the device's MIB (Management Information Base).
The Scaling Problem: Why SNMP Breaks at Hyperscale
The core problem with SNMP is its polling latency. If you poll a device every 5 minutes, you only know the state of the network as it was 5 minutes ago. A link that flaps 50 times between polls generates just one metric change in your monitoring system — the final state. This is a critical observability gap for modern networks where failures propagate in milliseconds.
The Polling Cost Function
The total polling load on an NMS grows linearly with both devices and metrics:
Polls/hour = (Devices × Metrics per Device) / Poll Interval (min) × 60
For 5,000 devices with 50 metrics each, polled every 1 minute: 15,000,000 GET requests per hour. The NMS becomes a significant load generator on the network itself.
Streaming Telemetry: The Push-Based Alternative
Streaming telemetry inverts the model. Instead of the management system requesting data, the network device pushes data to a collector at a pre-configured interval or on change. The modern standard for this is gNMI (gRPC Network Management Interface), developed by the OpenConfig working group.
YANG Data Models: The Structured Schema
At the heart of modern telemetry is YANG (Yet Another Next Generation), a data modeling language defined in RFC 7950. Unlike SNMP MIBs, which are flat and vendor-specific, YANG models define structured, hierarchical schemas that can be vendor-neutral (OpenConfig models) or vendor-specific (Cisco YANG, Juniper YANG).
A gNMI path to interface statistics on an OpenConfig model looks like a filesystem path:
/* OpenConfig gNMI path for interface counters
/interfaces/interface[name=GigabitEthernet0/0]/state/counters/in-octets
This path-based approach is immediately human-readable and programmatically navigable, unlike an SNMP OID such as 1.3.6.1.2.1.2.2.1.10.
The Modern Telemetry Pipeline Architecture
A production-grade streaming telemetry deployment follows a pipeline architecture with four distinct stages:
1. Collectors
Agents (Telegraf, gNMIc) that establish gRPC connections to network devices, receive streamed data, and normalize it into a standard format (InfluxDB line protocol, Protobuf).
2. Message Bus
Kafka or NATS provides a high-throughput, durable message queue between collectors and the processing layer, decoupling ingestion from analysis.
3. Time-Series Database
InfluxDB, TimescaleDB, or VictoriaMetrics stores the time-indexed telemetry data for querying and long-term analysis.
4. Visualization & Alerting
Grafana dashboards and alert rules consume the time-series data, providing sub-second observability metrics that are impossible with SNMP polling intervals.
Operational Comparison Summary
| Criterion | SNMP Polling | Streaming Telemetry (gNMI) |
|---|---|---|
| Latency to detect event | Poll interval (typically 5 min) | Seconds or immediate (ON_CHANGE) |
| Transport protocol | UDP (unreliable) | TCP/gRPC (reliable, encrypted) |
| Data model | Vendor-specific MIBs | Standardized YANG (OpenConfig) |
| Security | Community strings (v2c), basic auth | mTLS, certificate-based auth |
| NMS CPU load | High (NMS drives all requests) | Low (device drives pushes) |
Conclusion
SNMP is a reliable workhorse for small, static networks where 5-minute polling latency is acceptable. For any network with more than a few hundred devices, high-velocity events (link flapping, microbursts), or a security posture that disallows community strings in cleartext — streaming telemetry via gNMI is the only architecture that can deliver the sub-minute observability that modern operations require. The migration investment in YANG model familiarity and pipeline tooling pays for itself the first time you detect and remediate a cascading failure before users open a ticket.