Network users don’t care how many routers, switches, radios or cables it takes to get to the services they want. Yet almost all network monitoring tools focus on the network infrastructure itself, not subscriber traffic. Before we look at the alternative, let’s look at how traditional network monitoring works.
Traditional Network Monitoring
ICMP pings are used to obtain simple measurements of reachability and latency. This is accomplished by sending an ICMP echo request message to a network node. Then it tracks how long it takes to receive an ICMP reply message. If no response is received, this indicates that the target is either down or unreachable. And, if some messages are lost, packet loss can be calculated.
SNMP is a management protocol supported by almost all network elements. It extracts information like the number of interfaces, counters on the interfaces, and other system info like CPU and memory usage.
Figure 1 shows a very simple WISP network with an NMS system. The red lines show a logical view of the communication between the NMS node and the managed network elements.
Typically, the NMS will send one or a series of pings and/or do SNMP queries for system information every few minutes. ICMP, SNMP, and other traditional network monitoring tools are an important and required part of your network management arsenal. However, it’s important to understand the limitations of this type of network monitoring. Specifically, it tells you far less about the end user experience than you might think.
Problems and Limitations
Fundamentally, traditional network monitoring approaches are focused on the elements that make up the network, such as routers and switches. Commonly, related metrics like the latency to the element or throughput are used as a proxy for the per-subscriber experience. There are several reasons why this approach is a rough approximation at best, and at worst, very misleading.
Networks are often built with redundant paths. This can easily lead to situations where the metrics, such as latency reported by the NMS, don’t measure the same path as customer traffic. Consider the network in the figure below.
In this network, there are two redundant paths to a customer. The routing protocol has selected the top path for subscriber traffic, while the NMS traffic follows the bottom path. If the top path gets congested or starts dropping packets, the NMS-based measurements provide no insight into this. That’s because, from the NMS’s perspective, there is no problem.
Understanding subscriber QoE requires metrics on a per-subscriber basis. To see why, consider an AP with 10 users and a total throughput of 25Mb/s. From these two numbers, the most that can be concluded is that the average user gets 2.5Mb/s. The reality could be that there is a single user getting 20Mb/s and the other nine users each achieve 0.55Mb/s. That is, you have one really happy customer and nine unhappy ones.
This simple example may be addressed by per-SM counters in the AP. However, this isn’t possible when considering more advanced QoE metrics such as latency and TCP re-transmissions. In general, network element-based management does not have the metric granularity required to say anything intelligent about per-subscriber QoE.
NMS systems typically send ICMP echo requests or SNMP queries every one to five minutes. Between these monitoring intervals, no new information is obtained. This makes it impossible to monitor for events or changes that occur in anything less than the monitoring interval. For example, a 30-second outage that ruined a customer’s game session can go completely undetected when using one-minute ICMP ping intervals. In general, the more frequent measurements are taken, the more accurate the estimate of the ‘true’ QoE is.
It’s important to note that the answer to this problem isn’t as simple as increasing the frequency of ICMP echo requests or SNMP queries. In both cases, the more frequently these operations are performed, the more management traffic is added to the network. Also, for many network elements, SNMP queries are resource-intensive, making it impossible to query at short intervals.
Subscriber QoE is about the experience from the subscriber’s point of view. Somewhat unfortunately for ISPs, the subscriber point of view includes network elements in their home such as their Wi-Fi router and the end device itself. For example, subscribers cannot distinguish poor QoE caused by AP to SM problems from poor Wi-Fi in their home. In both cases, they are likely to blame their ISP and are more likely to churn or drive up your support costs with complaints.
Interacting with in-home network elements is outside the domain of traditional NMS approaches and is technically impractical due to the presence of home NAT gateways.
Differentiated Network Treatment
Perhaps the most insidious way that network element-based monitoring can lead to an incorrect view of subscriber QoE comes from traffic being treated differently by network elements on the path. The figure below provides a logical view of how this can occur.
Notice that in this example, the traffic from the NMS follows the same path through the network as subscriber traffic. However, due to default or configured queueing policies, the traffic is treated differently. This can lead to situations where the NMS reports low latency but the actual subscriber packets experience high latency. Specifically, we have seen real-world examples where subscriber TCP traffic was directed to different queues than ICMP traffic. In this situation, the subscriber TCP traffic experienced over 50ms of latency while the NMS was reporting latency to the AP at only 5-10ms.
Many types of network equipment have default policies that direct traffic through different queues based on things like the IP DSCP mark. As a result, this type of problem can occur in your network even if you haven’t done any special network configuration.
Subscriber QoE Monitoring
Subscriber QoE monitoring takes a fundamentally different approach compared to network element-based monitoring. Instead of monitoring network elements and inferring the subscriber’s QoE, this approach directly extracts QoE metrics from the subscriber’s traffic. This avoids the pitfalls discussed earlier and provides a critical augmentation of existing network element-based monitoring solutions.
To understand how this works, consider the figure above. There are three flows to three different devices in a subscriber’s home. A subscriber QoE-based monitoring solution is able to analyze the traffic for all of these flows in real time to extract QoE metrics. Using this approach, important QoE metrics such as latency, loss, jitter, and throughput can be calculated on a per-tower, sector and subscriber basis, providing you with the insight into subscriber experience that you need.
The advantages of a subscriber QoE-based monitoring approach include:
- No new traffic is added to the access network. QoE information is obtained by processing the subscriber’s traffic.
- QoE metrics cover the entire subscriber experience including home Wi-Fi quality.
- By measuring the subscriber’s traffic directly, you can be sure that network QoS configuration is not causing misleading results.
- Metrics can be extracted with a high granularity (per-subscriber and per-IP address) and with a higher frequency than ICMP or SNMP allow.
- Ability to target network upgrades to areas of the network with poor QoE, and perhaps more importantly, delay upgrades to parts of the network that have a good QoE even though they have high load.
A network that delivers a good QoE has happier customers that are less likely to leave and less likely to call support. Traditional NMS tools are based on monitoring network elements such as routers and switches, and inferring the quality that the subscriber receives. This approach cannot measure the entire subscriber experience and can often be wildly misleading for many reasons.
Subscriber QoE based monitoring focuses directly on QoE metrics instead of network elements. This enables WISPs to understand the QoE the network delivers down to a tower, sector and subscriber level. With this information in hand, WISPs can better plan their network upgrades and identify troublesome areas of the network.
Preseem is a network telemetry platform with a focus on providing WISPs with the QoE insights and optimization needed to improve customer satisfaction, reduce churn and lower support costs. Book a demo with us to learn more.