Preseem is a Quality of Experience (QoE) monitoring and optimization platform. It uses highly granular and frequent network telemetry to find problems that negatively affect the subscriber experience. This post outlines a real-world WISP network configuration problem that caused poor subscriber QoE and was identified by Preseem.
We’re currently working with a WISP to roll out Preseem in their network. One of the first things we observed was that the percentage of retransmitted or out-of-order TCP segments in the network was much higher than what’s typical for WISP networks.
The chart above shows the percentage of TCP segments that were retransmitted or delivered out of order and the dramatic change once the underlying issue was fixed.
How Packet Reordering Impacts TCP Flows
To explain out-of-order delivery, TCP will be used throughout the rest of this post. Note, however, that out-of-order packets negatively impact most other types of traffic as well. In TCP terms, the unit of transmission is called a segment.
TCP is the most common transport protocol and is a reliable byte stream protocol. That is, it delivers an ordered byte stream from one application to another. Since packet networks are inherently lossy, TCP compensates for this by adding a sequence number to every segment sent. This allows the receiving end to ask for missing segments and reconstruct the order if the segments get reordered in the network.
Being able to quickly respond to loss is critical to TCP performance and is accomplished through techniques like Fast Retransmit. Unfortunately, these and other techniques interact poorly with a network that delivers packets out of order. At a high level, packet reordering causes:
- Delayed delivery of data to applications
- Unnecessary retransmissions
The end result is higher application latency and lower throughput to the end subscriber.
What Causes Packet Reordering?
There are two main primary causes of packet reordering: route changes and flow-unaware parallelism.
A common network configuration is to build with redundant network paths, which is a good thing. When there’s a failure or policy change, the path that two particular end nodes use to communicate may change. While the network converges on the new path, packets from the same flow can take different paths. This leads to packet reordering at the end host but this is typically a short-lived phenomenon.
Flow-unaware parallelism is typically the source of persistent packet reordering. The canonical example of this is bonded network interfaces between two switches or routers.
How Bonded Interfaces Affect Packet Reordering
Bonding interfaces is often desirable when the total traffic exceeds the capacity of a single link. The problem occurs when the outgoing link of a bonded pair is chosen by a simple algorithm such as round-robin. That is, packet 1 goes to link 1, packet 2 goes to link 2, packet 3 goes to link 1, repeat. In this case, packets from a single flow take a different, albeit very similar path to the end node.
Given variations such as packet size and buffer utilization, this can result in a surprising amount of packet reordering. Even more insidiously, packet reordering can occur within a single device (router, switch) when packets are processed by different processors or NICs. In both cases, the solution is to do flow-aware load balancing. That is, all packets for a given flow are sent to a single link or processor. This is often accomplished by hashing fields of the packet header.
Indeed, the source of the high percentage of out-of-order TCP segments in this WISP network was bonded Ethernet interfaces that didn’t have consistent flow hashing configured. It took a bit of investigation by the customer to determine the troublesome devices but Preseem made the problem painfully apparent within minutes of being deployed.
To find out how your network configuration impacts your subscriber quality of experience, contact us for a free 30-day trial of Preseem.