During our recent ISP Virtual Summit, we held a technical webinar on how to measure QoE and how it can help operators manage their networks proactively. Hosted by Preseem CPO Dan Siemon and Solutions Architect Andrew Sit, the session also included a detailed look at how Preseem is deployed within customer networks. You can view the full webinar here or read on for a recap of the QoE portion.
(If you’re wondering what QoE is, we define it as simply the Quality of Experience a subscriber has when using the internet.)
Why Regional Operators Should Care About QoE
Providing superior QoE is more important than ever for a couple of reasons. First, more people are now able to work from home and need a reliable internet experience to ensure important video calls don’t buffer or drop.
Second, our homes now have many internet-connected devices. One person may be watching Netflix while another is playing Fortnite and another shopping online, etc. QoE optimization ensures that each person’s internet experience ‘feels fast,’ no matter what plan they have, how busy the network is, or how many people in their home are online at the same time.
Good QoE translates to happier customers, and this means less churn and fewer support calls for operators. It can also help increase subscriber capacity on the network, though this needs to be done in a way that isn’t artificially reducing demand or using similar techniques that are a) bad for subscriber QoE and b) make your service less competitive against fiber.
Network Congestion Can Cause Poor QoE
So what are some of the problems that lead to poor quality of experience for subscribers? Network congestion can certainly be an issue, as a busy network means less throughput for customers. As well, there’s the subtle related issue of how QoE can really degrade rapidly in congested networks.
For example, many operators will have seen a busy link operating just fine at 90% but once it gets to 92-93% suddenly the experience is bad. This is called a congestion collapse. Essentially a threshold is reached where suddenly latency goes way up and “good put” (useful bytes transferred by the network) goes way down.
What’s interesting is that this is not fundamental, it’s basically the result of oversimplified queueing. This can be avoided by using better queueing techniques like AQM.
It’s also important to remember that these links are not just access points or backhaul links. When you deliver a plan to a subscriber, e.g. shaped to 10Mbps, they have a virtual link of 10Mbps, and this exact same problem happens within that virtual link. This self-congestion within their own plan speed can be a major cause of poor QoE.
Proactive Management is Difficult Without the Right Tools
Knowing how to measure QoE and manage it proactively has traditionally been extremely difficult. Identifying problems in the network and prioritizing fixes is not easy to do, especially at scale. This is especially true in multi-vendor networks or for operators with a limited number of RF experts on staff. Some of the problems that can restrict proactive management of the experience include:
- Multiple vendor tools and lots of charts
- RF expertise (can be very manual and time-consuming)
- Correlating information from different sources (e.g. subscriber usage, RF conditions)
- Vendor tools with point-in-time snapshots are insufficient (instead it would be better to look at the worst minutes of the day and compare with real-world performance—this is what Preseem does). AP management tools are useful for viewing link rates and debugging Ops problems, but point-in-time metrics are not useful in diagnosing the overall quality of the network. This requires much more of an ‘over-time’ analysis.
Also, measuring QoE can’t really be done using traditional network management solutions. A network monitoring system is focused on elements (e.g. SNMP) and not subscribers. Obviously, you’ll always need to know when network elements are up or down, but measuring the quality of experience is a different thing entirely. This requires different metrics than those you’d use when monitoring network elements.
It’s also impossible to scale because a commercial NMS is not priced in such a way that you can get down to the individual subscriber level. For example, you wouldn’t want your NMS pinging 5,000 customers constantly—that would become cost-prohibitive in a hurry.
QoE is More Than Just Traffic Management
You can’t necessarily deliver a great experience by arbitrarily limiting traffic. ISP QoE is about understanding and improving the service delivered to customers. To have a complete QoE solution, you need to have:
- Subscriber QoE information, measured in a way that’s indicative of their actual experience.
- Deep integration with the underlying access network. For example, the first thing Preseem does is map the subscriber topology and the access elements to which they’re attached.
- Collect a ton of information. We understand the real-world behavior of APs because we see hundreds of thousands of APs with very fine-grain data on how they’re performing.
With this information, you can drive proactive quality improvements, which means you don’t actually need to do much traffic management other than to occasionally mitigate congestion.
Two important aspects to effective traffic management are 1) optimize for latency and not just throughput, and 2) don’t arbitrarily limit traffic—it’s not really necessary in most cases and there are better ways to do it. For us, analytics and driving improvements are much more important to delivering good QoE than limiting traffic.
How to Measure QoE Effectively
At Preseem, we measure the traffic for each individual IP address in the system—latency, loss, throughput, and combine that with our topology knowledge that we have for the network (discovered from the network directly)—and we use that to understand the quality of experience delivered in various parts of the network. This allows you to understand and uncover things like:
- Overloaded backhauls and APs
- Poorly-performing APs
- In-home Wi-Fi problems
- Whether an issue is network-wide or specific to the customer
Also, because this is measured at the traffic level, it can also find interesting things like:
- Bad bonded links
- Overloaded routers that are causing packet loss
- Bad optical cables
Because it’s not looking for any specific cause, it can find some surprising things. We believe this is the only way you can understand the experience, e.g. subscriber metrics tied back to the topology (which then makes it actionable).
Solving the Self-Congestion Problem
Operators will often get calls or tickets from customers complaining that their internet feels slow, or that multiple devices and activities in the home are slowing each other down (e.g. online gaming is causing Netflix to buffer). If you look at the relevant AP and don’t see anything wrong, that’s usually a sign that the subscriber is self-congesting. This is generally caused by poor queuing management techniques in your plan enforcement platform.
Rather than use standard FIFO queue shaping, Preseem uses FQ-CoDel shaping which divides traffic across virtual queues per subscriber. This means flows can be isolated and keep latency low, even when multiple devices are online. This solves the problem unless the network has other underlying problems, e.g. overloaded APs.
Manage Your Multi-Vendor Network Proactively
One of the major ways that Preseem helps operators manage their networks proactively is by providing simple scores that quantify QoE and show where issues are occurring. The inputs for these scores include the behavior of your APs, behavior of CPEs in the network, combined with a model of data collected from all Preseem customers that tells us what APs are capable of and how they should perform.
For example, our Business Value score is a combination of subscribers with a) poor modulation and b) that are active when other subscribers in your network are active. This tells you those subscribers that are having the biggest impact on your airtime, so you can fix those first to increase capacity on your network.
This gives your team one spot to understand the quality of experience being provided across multiple vendors and gives them an action list to execute.
Similarly, our Subscriber Capacity score uses the same data-based approach. Preseem takes in billions of metrics across our customer base and this tells us how subscribers typically behave in a given network and what specific APs should be capable of. We’re then able to convert that into a score that tells you how many subscribers you can have on a given AP in the current RF conditions. This allows your sales and marketing team to proactively target specific areas where you know you can add more customers.
Coming Soon: Automatic AP Shaping and Plan Assurance
Dan also gave a preview of a couple of upcoming features for Preseem designed to help operators further improve QoE performance and embrace proactive management. The first is our Automatic AP Capacity Management tool that helps solve overloaded AP issues. The second is our Plan Assurance feature which will help operators understand the network’s ability to deliver a given plan speed while maintaining a great experience.
We also touched on how Preseem is moving to a multi-access technology approach as our traditionally fixed wireless customers adopt fiber into their networks. He also spoke about the goals of making Preseem a layer of network quality assurance that sits atop whatever access technology operators are using.
As an added bonus to the session, our Solutions Architect Andrew Sit talked in detail about the technical aspects of deploying Preseem, our available integrations, and also took some audience questions. That starts at around the 25-minute mark. Watch the full video below!