Traditionally, networks have been monitored through pinging or polling network elements such as routers and switches to determine if they are online or offline. This is often done at long intervals such as 1 or 5 minutes. This type of network monitoring enables the dreaded ‘network down’ alerts at 3 AM that are the source of nightmares for every network ops person and provides only two possible states or one bit’s worth of information: up or down.
For a long time, this amount of information was almost all a good network ops team could absorb. It was a difficult job to just keep the network online. Over time, network equipment has become more reliable and operators have built networks with hardware and path redundancy that makes it less likely that a node or network down event occurs. This has opened up the door for network operators to start managing based on much more than simple up and down status information. We think of this as the grey scale of network management.
Like this image, there is an infinite amount of grey between up and down. We believe the need to understand the grey scale is the driving force for the move from network reporting to network telemetry.
Network telemetry vs Reporting. Click here to learn more.
Exposing the grey scale that already exists in your network comes down to choosing the network metrics that are measured by your network telemetry platform. To some extent, this is related to the specific equipment deployed in your network. For example, a TLS proxy may be impacted by the total number of TCP connections or the number of new TCP connections per second. Similarly, some network equipment is sensitive to the packet per second rate and packet size. While extracting network telemetry metrics that are specific to particular types of equipment or applications can be useful and interesting, we believe there is a much more fundamental set of metrics that every network needs to understand in order to deliver good network quality to its users. We think of these metrics as the ‘physics’ of networks and we’ve worked hard to build a platform that brings this information to network operators at a high granularity and at short intervals.
“The three metrics that every network should track at a high granularity (each endpoint such as a customer or device) and at a short interval (<=10s) are Throughput, Latency & Loss”
These metrics control the performance of every use case and application your network delivers. An application may still deliver poor performance even if the network is operating properly but the inverse is not possible.
If you like this post, please help us by sharing it with your network by using the icons here