How Netflix Works: Webinar Recap
Preseem recently held a webinar where our co-founder and Chief Technical Officer Scot Loach explained how Netflix works, why it’s not a stream, and why your ISP should prioritize managing the network and not individual applications.
(Some of the content in the webinar and this recap article was covered in a previous blog post—consider this an update and expansion on that information.)
We chose Netflix specifically as our focus for the webinar in part because of its sheer dominance. According to Sandvine’s 2022 Global Internet Phenomena Report, Netflix accounts for close to 21% of the total volume of internet traffic, followed by YouTube (15.9%) and Facebook (7.8%). However, if you look at peak hours, reports have Netflix as high as 40% of total traffic on some networks.
We also chose it because we have some unique expertise with Netflix thanks to a previous (pre-Preseem) product we made named NightShift. This was an in-home Netflix cache designed to reduce usage of the internet during peak hours and to help people binge their favorite Netflix shows without buffering.
During the development and deployment of that product, we learned a lot about how Netflix works and believe it’s important to pass that knowledge along to ISPs, given the application’s popularity and effect on networks.
How Does Netflix Work?
So, what is actually going over the network when a Netflix video is played?
Netflix has been streaming video since 2007, since which the industry has evolved greatly (new devices, new hardware, advances in video encoding technology). All the newer encoding standards do a better job of compressing video, but Netflix still has to operate on devices with older standards as well.
Because of this, every title on Netflix has to be encoded in each container format. As well, each title is encoded to a different ‘bitrate ladder,’ because different devices can play video at various resolutions. Resolutions also vary, because some plans won’t support higher resolutions or a network might not be able to support them during peak congestion periods.
Netflix now also encodes separate chunks or segments of videos at different levels, e.g. three minutes at a time. The reason for that is some videos are easier to compress from frame to frame than others. For example, a cartoon can be compressed to a lower bitrate compared to an action movie. As a result, there’s really no correlation between the bitrate of a video being played and its quality or resolution.
Netflix is Not a Stream
This is covered in our previous blog, but it’s important to reiterate that streaming video on Netflix is not a stream at all. It’s actually a file download, just like any other big file being downloaded. How Netflix works is that files are downloaded into a buffer chunk by chunk. When your player (e.g. TV, phone, PC) is playing a video, it’s just taking the next data segment of the video from an internal buffer, decoding it, and putting frames on the screen for you to watch. What’s interesting is how the buffer is filled from the network.
When a subscriber starts playing a video, their player device authenticates with Netflix. This way, Netflix knows who they are and what plan they’re on, as well as the type of device and the resolution its capable of playing. Netflix then sends a manifest file, which has the bitrate ladder and a list of three URLs for each bitrate that the player can visit to download the video. URLs are chosen based on the Netflix CDNs that are closest to the subscriber.
The player then selects a resolution from the bitrate ladder, connects to the CDN nodes, downloads video from each, figures out which it likes best, then downloads the video chunks into the buffer. Video is downloaded using HTTP over TLS, which means the video downloads are encrypted and can’t be inspected on the wire. This means only the player and the server knows the content, resolution, and codec.
Where Netflix Traffic Comes From
Netflix traffic actually comes from many different places, such as a variety of ISPs and internet exchange points close to the client watching a video. The company has an Open Connect cache farm that ISPs can qualify to take part in. Netflix will then actually ship OC servers and provide the content. The ISP’s subscribers in the area then get their Netflix from those caches. Check out their Open Connect site for more information.
How Netflix Uses Buffers
Netflix uses a large buffer (almost five minutes of video) that allows it to handle network outages or degradations without it interrupting the user experience. If the throughput goes below the level of the video for a while, Netflix will start a new buffer lower on the bitrate ladder and start loading the buffer with the lower-bitrate video. The playing bitrate will then change to the lower-bitrate video, called a ‘downshift.’ The opposite can happen as well to upshift the buffer to a higher bitrate. The exact details of how this is done are proprietary to Netflix.
It’s important for ISPs to know this, especially if they’re using bandwidth bursting, as it could cause Netflix to misestimate the throughput when a video starts.
The image below shows three TCP flows being used to deliver video to a smart TV in a real experiment conducted by Scot. This shows how Netflix is not a stream but a download. It’s not streaming at a fixed bitrate of 6Mbps—it’s actually doing 40Mbps early on to fill up the buffer, then zero for a while as it plays some of the buffer, then spiking between 15-25Mbps and zero over time as it downloads segments to keep the buffer full. From there, it divides pretty equally between the three flows to keep the buffer topped up.
Based on some other experiments Preseem did, we found that different players have different behavior. An Android phone might typically use one flow per stream, for example, while a PC browser will use three.
The Myth of Managing Netflix
What does all this mean for network management policies and QoE management? For one, it explodes the myth that network policy can control video resolution.
Using the example in the image below, imagine your ISP wanted to use a tool like Deep Packet Inspection (DPI) to set and enforce a policy where, during peak hours, subscribers only get 1080p. DPI gives you two tools for this: a per-flow bitrate cap and an overall per-user bitrate cap. Let’s look at both options.
Per-flow: In the example on the left below, this 12 Mbps streaming video has three flows, so maybe if you set a 3Mbps per-flow limit, it could restrict it to 1080p instead of playing in 4K. In the example on the right, however, the 3Mbps per-flow limit will be fine for mobile devices but will severely downgrade the quality for the TV watcher below 1080p. As a result, you haven’t delivered the experience that your policy promised.
Per-user: Alternatively, imagine you’ve set a limit of 8Mbps total, which should be enough for a 1080p stream. In example 1, this will split each flow into about 2.5 Mbps, and that should be OK. But again, in the second example with multiple devices, the TV flow will definitely be below 1080p and the mobile devices will also struggle to get to their 720p resolution.
These are just examples, but any vendor that tells you that you can restrict Netflix playback to a specific resolution is telling you something that’s not possible. You can’t control the resolution that your subscribers will experience, and you also can’t control the number of concurrent streaming sessions that they’ll get.
Using DPI for QoE is No Longer Relevant
Because of the way Netflix works—e.g. running at 40 Mbps to fill the buffer when video playback begins—it can have a huge effect on other protocols that are running concurrently, at any point of congestion (e.g. the user self-congesting or upstream at an AP).
An older workaround for this would be to use DPI and configure it to limit Netflix during peak hours so that there’s room for other traffic. The main problem, however, is that there’s really no way to know what that limit should be. You don’t know what the variable bit rate encoding will be, how many concurrent streams, etc.
The secondary issue is an overall problem with DPI. It uses application signatures to identify traffic, and these can be brittle. As applications change the signatures have to be updated, for example, or as new applications appear signatures have to be added.
TLS 1.3 is a Problem for DPI
DPI can recognize Netflix today because Netflix’s signatures haven’t changed in a decade. With the coming introduction of TLS 1.3, however, that may change due to encryption that would make it possible to send traffic with no way to identify it. In the near future, Netflix and other streaming services may become much harder to detect.
For example, consider the Sandvine report we referenced at the start of this blog. “Unknown QUIC” is #4 on their list of internet volume—that’s traffic that hasn’t been identified but which could well be Facebook or YouTube or some other popular app. However, all they can identify is the protocol it’s using.
There’s a trend toward greater privacy and greater encryption. If and when major applications like Netflix and Disney Plus adopt TLS 1.3, DPI will no longer be able to definitively identify them.
So how to prevent Netflix from taking all your capacity? The solution lies with Active Queue Management (AQM) using FQ-CoDel/Cake. This solves the problem of one protocol crowding out others. With AQM, everything gets its fair share of the network, so that traffic feels fast even when queues are full. This is the way of doing network management in the 2020s to solve this problem.
TCP Proxies (Acceleration) Can Impair QoE
Like DPI, TCP Proxies (Acceleration) are a traffic management technique whose usefulness is fading. TCP Proxies can fill the buffer faster, but at the expense of other protocols such as UDP traffic like gaming, web browsing, and VoIP (not to mention YouTube and Facebook video streams that use QUIC/UDP).
This leads to congestion within the home and across subscribers sharing an AP. Netflix has its own algorithms for dealing with network congestion, such as their large client-side buffers and the resolution ladder shifts mentioned above. As a result, ISPs don’t really need to do anything extra to help Netflix have a lot of the network. Netflix will also likely migrate to QUIC/UDP as HTTP3 becomes standard, so TCP Proxies will become even less helpful.
Manage the Network, Not the Netflix
To summarize, the main message here is to manage the congestion, not the application. Here’s why:
- Netflix uses multiple TCP connections and uses TLS, so it’s not possible to limit the number of devices or streaming sessions
- Videos from Netflix are variable bitrate encoded, so it’s not possible to limit resolution/video quality with network policy
- Netflix downloads in short bursts at full link rate, which can negatively impact other traffic—AQM prevents this from using more than its fair share of the network
Artificial reduction in capacity is a mitigation of the problem, not a solution. Proactively managing capacity to deliver more bits to the customer is your best bet at providing a great experience. Networks that are optimized for particular applications or worse, particular transport protocols, are very brittle going forward. Instead, use AQM to maintain a clean network that’s fair to all protocols and provides the best QoE, and let Netflix self-adjust to network congestion using its own algorithms.
Watch the full webinar below!