Bufferbloat: The Hidden Problem and How Internet Service Providers Can Fix It

Bufferbloat: The Hidden Problem and How ISPs Can Fix It

Since the COVID-19 pandemic began, people of all ages have been spending more time online at home. A recent Statistics Canada study, for example, found that 75% of Canadians aged 15 and older have increased their internet usage since the pandemic started.

This increase includes a big uptick in bandwidth-chewing activities like online gaming (and downloading large game updates) and video calls. Think of all those work-from-home video meetings, remote school classrooms, and virtual catch-ups with family and friends.

For Internet Service Providers (ISPs), this increase in high-bandwidth activity can lead to Bufferbloat, a packet congestion issue that causes high latency, poor user quality of experience (QoE), and plenty of “slow internet” calls from increasingly frustrated subscribers. So, what is bufferbloat and how to fix it?

What is Bufferbloat?

If we consult our handy Preseem Fixed Wireless Glossary, it tells us that bufferbloat “refers to when network links become saturated with high bandwidth activity, such as online gaming or Zoom calls.” This results in high latency, causing the internet to feel slow for subscribers.

As Jim Gettys and Kathleen Nichols explain in this article, bufferbloat is also defined as “the existence of excessively large and frequently full buffers inside the network.”

Gettys should know—he was a core member of the group that first identified the existence of bufferbloat around 2010. Gettys co-founded bufferbloat.net with Dave Taht, who helped develop the FQ-CoDel AQM algorithm on which Preseem is based. As Taht’s Wikipedia page notes, his tireless work on bufferbloat issues has helped prove that advanced algorithms like FQ-CoDel are “effective at reducing network latency, at no cost in throughput.”

In 2020, we hosted a webinar with Jeremy Davis and Brandon Yarbrough from Visp.net. They said the main bufferbloat symptom ISPs experience is increase in latency, along with significant jitter and degradation of throughput. High latency causes poor quality of experience for users. This can lead to a spike in support calls from subscribers where VoIP calls or gaming are occurring in the home.

Woman yelling on support call with internet provider

It’s important to note that bufferbloat is not the same as the buffering that can happen when streaming video. This article is about packet buffering causing problems on a network that uses first-in, first-out (FIFO) queuing with large buffers.

As the article by Gettys and Nichols also explains, “Buffers are essential to the proper functioning of packet networks, but overly large, unmanaged, and uncoordinated buffers create excessive delays that frustrate and baffle end users.”

What Causes Bufferbloat?

It’s not surprising that internet users might be baffled by bufferbloat, as we don’t expect them to be technical experts. Ideally, however, ISPs should have an understanding of it. This way, they can explain it to their frustrated customers and, better yet, start fixing bufferbloat issues for them.

First, they should know what really causes bufferbloat in the first place.

Bufferbloat can happen anywhere on a network where packets can queue. Every network has a spot in the path that’s naturally slower, and that’s where bufferbloat will rear its ugly head. Typically, this happens at a bottleneck link where many packets are queued up.

The Role of Queues in Bufferbloat

Queuing occurs whenever a big pipe is narrowed or stopped down to a smaller pipe. This is a perfectly normal aspect of network design. However, poor queue management techniques at the bottleneck cause latency for each packet. This is because the newest packet in the queue has to wait for all the others to be transmitted. This in turn leads to a subpar quality of experience for the internet user. Physical interfaces often have queues thousands of packets deep, which is a classic case of bufferbloat waiting to happen.

As the slide below mentions, consider how a user might feel when the red packet at the back of the queue is important to how they’re experiencing service. Looks like they’re in for a long wait.

Slide showing the effect of queues on latency

Though bufferbloat can occur on equipment in a subscriber’s home, particularly directly in the upstream of their internet connection, a common location for fixed wireless providers is at the Access Point (AP) level. A congested AP can cause suffering for everyone on the network. Capping the AP at less throughput than it can push (say 10 Mbps) could help reduce this problem. That said, non-maximum throughput on an AP could be due to signal degradation. This can cause packets to queue up in the first place, leading to bufferbloat symptoms.

How TCP Affects Bufferbloat

Bufferbloat is also made much worse by how TCP works. Checking our glossary again, TCP stands for Transmission Control Protocol, the communications standard that allows computers to exchange data with each other. TCP is a key component of how the internet works.

If you have large buffers, TCP learns that “this network can hold X packets before it gets a loss.” It then sets its rate so that X packets are always in flight, which creates a kind of “standing queue” phenomenon. This adds latency to every packet that flows through the network interface controller as long as the TCP connection is active. This is where bufferbloat happens and it causes big latencies that will negatively affect your subscribers.

Slide showing relationship between big buffers and TCP

Bufferbloat happens because networking vendors are primarily focused on throughput, which is typically how they measure their equipment. To run at really high rates with a single TCP flow, you need a lot of buffering. This tradeoff conflicts with optimizing for latency. This is a problem because low latency corresponds to improved subscriber QoE, which means happier customers.

The discovery of bufferbloat has led to speed tests that now show loaded latency—the ultimate measure of bufferbloat. The irony, however, is that bufferbloat causes the problems that get people to run speed tests, whereas if they just had a consistently good experience there’d be no need to run the tests!

Fixing Bufferbloat

It’s not enough to measure bufferbloat, however. We want to fix it. In our experience, there are two ways to solve the link bottleneck problem:

  • Stay below the bottleneck rate so that the queue doesn’t grow, or
  • Wherever the bottleneck is, do something smarter than simple FIFO queue management to mitigate the issue

At Preseem, our focus is moving that bottleneck back to where we can use Active Queue Management (AQM) techniques to solve the problem rather than having to keep the absolute rate below the bottleneck bandwidth.

AQM techniques mitigate big buffer problems by intelligently choosing what packets to drop and when. Our use of AQM traffic management methods are based on the FQ-CoDel (Fair Queuing Controlled Delay) algorithm.

Slide showing how FQ-CoDel works

Using FQ-CoDel, traffic flows are isolated and automatically categorized into bulk or interactive flows based on how much queue they build up. This means they can’t “hurt” each other or slow each other down. This also means there’s no more guessing at rates per application or babysitting complex rule sets. Instead, you’ll go from FIFO packet queues that cause latency to getting short, isolated flows that produce very low latency, even when the network is busy.

As a result, good queue management strategy means subscribers can run their internet connection at full capacity without getting high latency and poor QoE. This means one household member’s large Minecraft update will no longer affect another’s important Zoom call, and your support team will be fielding far fewer “slow internet” complaints!

Summary

In a nutshell, when bufferbloat occurs on a network, the most obvious symptom providers see is increased latency and jitter. You’ll also get more support calls from gamers and those using VoIP, because bufferbloat will significantly degrade their experience.

Latency and throughput are the ultimate measures of QoE on a network. This is why Preseem measures latency at a fine grain level directly from subscriber traffic. By applying AQM techniques, devices in the home are prevented from hurting each other and traffic flows smoothly, even at peak times. Fixing bufferbloat really is possible. Using an AQM solution is the key to fixing this problem on your network for good.

Interested in reducing bufferbloat issues and providing a better experience for your subscribers? Contact us to book a live demo or start a free 30-day trial.

Subscribe to the Preseem Blog Newsletter

Stay in-the-know and get fresh content delivered to your inbox once a month.