Dan Siemon and Scot Loach from Preseem appeared live on ISP Radio with hosts Steven Grabiel and Dennis Burgess on April 4th, 2018.
This post is a written summary of the segment where Scot discusses Netflix and how it actually works (hint: it’s not a stream), for the benefit of network owners and managers.
Why Is It Important for Network Managers to Understand How Netflix Works?
Netflix is by far the most popular application on the internet today. In fact, about a third of the traffic on a typical Internet Service Provider (ISP) network is Netflix.
Network managers and operators need to understand how Netflix traffic works on a typical ISP network. Otherwise, it’ll be hard for them to manage their network given the immense pressure it can put on bandwidth. You can’t optimize something you don’t understand. So let’s get started.
What Happens When Netflix Plays a Video?
There are two parts to this: what it does when it plays and what it does on the network.
When Netflix plays a video, it starts from an understanding that it’s at second X of show Y. It then looks at its internal buffer where it stores videos and audio segments. Then it takes the next one out of that, decodes it, and puts it on the screen. What fills that buffer is the interesting part and the part focused on below.
When someone hits play inside the streaming app, it authenticates with network servers. Then, the app requests the manifest file for the video. The manifest file is basically a big list of files encoded in different bit rates and resolutions that the device being used can play.
This has three HTTPS links for each file, each coming from different locations on the internet. That is, three different CDN (Content Delivery Network) nodes. Those CDN nodes are selected by the app based on factors like BGP distance, throughput, and latency. The app will establish several TCP connections to different nodes and then it’ll retrieve those file chunks using regular HTTP over TLS. Here’s an example of what one of these looks like.
Connections are encrypted on the network but it’s basically a big obfuscated string requesting part of the file and it does it chunk by chunk. So it’s not a big complete file download, it’s chunk-by-chunk requests. Basically, what it tries to do is to keep the internal buffers full. So, there’s a big buffer and if there’s room in the buffer, the app will download the next chunk. If there’s no room left, it won’t download the next chunk.
The picture above shows some of the internal diagnostics of Netflix. In this example, you can see that the app is using a large buffer. About 125 Mb of video is buffered, equal to around five minutes ahead of playback. If something happens, the app has plenty of content buffered to keep playing before it has to fetch additional files.
It also shows throughput in the bottom, which is interesting. This is Netflix’s calculation of throughput based on the speed it gets when it downloads chunks from the network. If the app can’t keep the buffer full, it’ll switch to a lower resolution video and start a new buffer. Then sometime after that, the bitrate it’s actually playing will switch down to that lower resolution/lower quality. Similarly, if the throughput is high enough that it can play a higher resolution, the app will start a new buffer. Then it’ll start getting chunks from the high-resolution file and switch to the higher resolution later on.
The details of how it does this is proprietary to Netflix and is their “secret sauce.” What it’s trying to do is to maximize the quality of video the user sees and keep it close to the available capacity of the network. This way, the user gets the most high-def experience they can have. It’s also trying to minimize the number of up-shifts or down-shifts in resolution as this may impact the user’s experience. The basic intention behind this is to keep the buffer full to avoid annoying “Netflix buffering” messages and poor QoE.
Important Takeaway for WISPs
A shaper configured with bursting can actually cause Netflix to mis-estimate the throughput when a video starts. If a WISP offers, say, a 5 Mb plan and lets subscribers burst to 6 Mb for a little bit, Netflix will detect that higher limit and start downloading above the 5 Mb actual plan speed. This can cause extra shifting of resolution during the playback, which otherwise wouldn’t have happened. It can typically go from lower quality > higher > lower quality during playback. Since this is noticeable to the subscriber, it can impact QoE.
Netflix Isn’t Actually a Stream
It’s important to understand that Netflix isn’t a real-time stream. If Netflix was a typical video stream of say 5 Mb, it’d ideally be a flat-ish stable line along the 5 Mb line (like a Skype stream). That’s not how Netflix works. It’s downloading chunks of files and trying to keep the buffer full.
See the image below: during the beginning of the video, there’s high usage when the app tries to fill the buffer. You can see it using three different flows (color-coded) to maximize throughput. It goes ideal for a little bit when video plays. When the buffer gets some space, it shows spiky behavior trying to use the TCP streams to fill it. This example is based on one device on one network. It’s important to understand that this behavior may vary with different devices and network conditions over time, as Netflix does more research and changes its algorithm.
Netflix Will Top Out the Available Capacity
This is an experiment we did. We started a shaper at 3 Mb and then started a Netflix show. After a few minutes, we increased the shaper rate to 5 Mb and in this case, Netflix didn’t choose the higher encoding, i.e. it didn’t make the up-shift decision. So, we kept playing at a bit rate under 3 Mb. We were trying to see what would happen if it was filling the buffer with video well below the available capacity in the network.
You can see that it still uses all the available link capacity when it’s fetching chunks of video but it’s not consistent. You can see that the buffer fills in the beginning and then plays for a little while, gets room in the buffer, goes onto 5 Mb, goes down again. The point being that, while it’s filling the buffer for 10s of seconds at a time, it’s not leaving room for interactive traffic like VoIP or gaming to get through.
This confirms the known QoE insight that Netflix won’t allow interactive packets from VoIP calls or gaming to get through. Those packets will continue to wait behind Netflix traffic in the pipe.
Conclusion: Key Learnings for Network Operators
- Netflix uses multiple TCP connections and TLS. Therefore, it’s not possible to limit the number of devices or streaming sessions even with DPI-based platforms.
- Videos on Netflix are variable bitrate encoded (and dependent on genre of movie amongst other things). As a result, it’s not possible to limit resolution (like standard def) with network policy.
- Netflix downloads in short bursts at full link rate, which can negatively impact other traffic like gaming packets or VoIP. A strategy to fix QoE problems associated with Netflix behavior is to leverage modern queuing technologies such as FQ-CoDel.
In the end, it’s important to realize that the more capacity Netflix can use, the better resolution/picture quality it’ll offer to the subscriber. From an operator or business point of view, while you want to provide a network that offers a great Netflix experience, you do have network constraints in terms of available capacity and subscriber plans. However, artificially limiting the capacity for Netflix and other applications isn’t the right way to go from a QoE perspective.
Modern techniques like FQ-CoDel allow ISPs to offer a good experience to their subscribers within their enforced plan, while ensuring that interactive applications aren’t negatively impacted by high-bandwidth applications like Netflix or device updates.