How to Optimize LL-HLS for Low Latency Streaming

Choosing the correct codec is the first step to creating a low latency stream. However, how you configure your encoder settings will determine how low your latency will actually be.

In this guide, we will talk about how to tune LL-HLS for low latency streaming. We will focus on GOP size, part size, segment size, buffer size, network tolerance, and adaptive bitrate (ABR).

What is LL-HLS?

Low-latency HTTP Live Streaming (LL-HLS) builds on the successful HLS method for streaming video. If you’re unfamiliar with HLS, it was developed by Apple to enable streaming compatibility with Apple devices.

HLS, much like its DASH counterpart, adopts segments (typically a few to 10 seconds) as the basic unit to fetch video content. LL-HLS allows fractions of a segment to be individually addressed and fetched.

This has direct implications for the latency and zapping times. The segment size does not determine the latency. However, it is determined by the part sizes since the video parts can be fetched once a part is available and not per segment.

LL-HLS is suitable for low-latency applications where end-to-end latencies of a few seconds are required and playback closely follows the live event. The smaller parts also allow it to start more rapidly while keeping the latency small, as the player can start playback before the live segment is entirely available. In the right conditions, video can start playback with a part, not only at segment boundaries.

4 Key factors in low-latency streaming

Now that you’re familiar with LL-HLS, let’s shift gears to discuss the most critical encoding and packaging parameters and their impact on latency, video quality, bandwidth consumption, and the resiliency to the network variations.

1. GOP size

The group of pictures (GOP) size is one of the primary encoding parameters. It directly impacts video bitrate and quality and indirectly affects end-to-end latency. It also determines how often a keyframe (or IDR frame) will be available.

With LL-HLS, the player requires a keyframe to start decoding, meaning it can start playback only at GOP boundaries. Longer GOPs cause a higher start-up delay and latency.

Apple’s recommended GOP size is two seconds. Typical LL-HLS implementations support LL-HLS with three-second end-to-end latency when the GOP is set to one second.

Small GOP sizes tend to be the go-to approach. However, if you have a lot of keyframes, it increases inefficiency in compression, which means you will use more bandwidth, and the streaming quality will go down at the same bitrate. This effect becomes large when GOP sizes fall below two seconds.

If you want lower bandwidth consumption and reasonable start-up time, set your keyframe interval to two to three seconds. On the other hand, if your priority is to have minimal start-up delays and low latency, the GOP size should be smaller and set so that all parts start with a keyframe.

Impact of GOP size on video bitrate

Small GOP sizes result in higher bandwidth consumption. The smaller the GOP size, the more frequent the keyframes will be. Depending on the video, keyframes can be 10 times larger than P frames, and small keyframe intervals will increase the video bitrate and, hence, the bandwidth consumption.

Impact of GOP size on video quality

The GOP size also impacts the video quality. The larger the GOP size, the higher the video quality. We can put more details in the P frames for the same bitrate when the GOP size is larger.

We use Video Multimethod Assessment Fusion (VMAF) to measure the video quality. VMAF is a video quality metric designed by Netflix, consolidating four different metrics, including:

Visual Information Fidelity (VIF): considers fidelity loss at four different spatial scales
Detail Loss Metric (DLM): measures detail loss and impairments that distract viewer attention
Mean Co-Located Pixel Difference (MCPD): measures the temporal difference between frames on the luminance component
Anti-noise signal-to-noise ratio (AN-SNR)

Using this criteria to analyze your video objectively assesses how the GOP size affects its quality.

Impact of GOP size on zapping time and latency

In LL-HLS live streaming, if we increase the GOP size to decrease the bandwidth consumption and increase the video quality, we need to sacrifice the short zapping time and/or the latency.

The player requires a keyframe to start decoding, meaning that a large GOP will impact the stream’s zapping time and latency. It can wait for the following GOP, implying a long startup time and low latency. Alternatively, it can start playback of the current GOP, which means short startup times but potential latencies up to the GOP size.

For example, large GOPs with only one keyframe every six seconds allow the player to start playback on a position once every six seconds. This doesn’t mean your zapping time will be six seconds, but it might require your player to start at a higher latency. With the six-second example, starting playback immediately implies that the average additional latency at the start will be three seconds, and in the worst case, it can reach up to six seconds.

2. Part size

The part size directly influences the end-to-end latency in LL-HLS. Generally, the smaller the part size, the lower the latency. However, it is not that cut and dry.

Apple says the parts can be as low as 200 ms, but we must remember that in LL-HLS, the player must start the playback with a keyframe. The player cannot start the playback at segment boundaries, but it can begin at any independent part as long as it starts with a keyframe.

If the part does not start with a keyframe (which is the case when the part size is smaller than the GOP size), the player should either seek back to a point where a part starts with a keyframe or wait for the next keyframe to start the playback.

For example, consider a setup where the GOP size is two seconds, the part size is 500 ms, and the playback request is sent in the middle of a six-second segment. The player needs a keyframe for starting the playback. It must wait for the following keyframe in the next third part, which means at least 1.5 seconds of zapping time or seeking back to two parts behind, which will bring additional one-second latency to the end-to-end latency.

Ideally, the part size and the GOP size should be equal to have the least zapping time. In that case, all parts are marked “independent,,” and the player can start the playback at any part boundary. A smaller part size will lead to a lower minimum buffer size and lower latency. However, a part size that is too small will cause overhead because too many HTTP requests will be handled.

If you can guarantee the perfect network condition and your primary focus is to have the lowest end-to-end latency, we recommend using a 400 ms part size. If the network condition is variable and you need to have a smooth playback during network ups and downs and benefit from extra-low zapping time, we recommend setting your keyframe interval and part size to one second, as it strikes a balance between latency and viewer experience at start-up.

3. Segment size

The segment size in LL-HLS does not directly impact the latency as it does in traditional HLS. In general, having longer segments that allow for larger GOP size is nice, which means higher video quality and lower bandwidth consumption.

On the other hand, in LL-HLS, the size of a large segment impacts the number of parts you need to list in your playlist. As a result, it affects the size of the playlist (and how much data must be loaded in parallel with the media data).

Long segments can significantly increase the size of the playlist, causing network overhead and impacting streaming quality. Segments can’t be too small, either, since that imposes a smaller GOP size, which results in lower video quality and higher bandwidth consumption.

Segment size should be equal to or larger than your GOP size. It cannot be too small due to consequent poor video quality and not too large because of the LL-HLS limitations mentioned above. Apple’s recommendation for segment size is six seconds for LL-HLS, which balances video quality and overhead in the network.

4. Buffer size, network tolerance, and ABR

There is always a trade-off between a secured smooth playback in all network conditions and achieving the lowest possible latency. To cope with network and other variations, LL-HLS maintains a buffer to handle the jitter and unforeseen hiccups in the video transmission.

Typically, the larger the buffer, the higher the tolerance for network issues, but the higher the latency. In LL-HLS, the buffer has a default of three-part durations.

For example, when you have parts of 400ms, your buffer will target a size of 1.2 seconds. Based on our tests, and with correct settings for the part and GOP size, with slightly higher part size, for example, around one second, we notice that the buffer size can be decreased somewhat without impact on user experience. However, it should never have a buffer of fewer than two parts as a baseline.

It’s important to remember that the network condition is not always perfect. In addition to jitter, we also encounter drops and variations in the network capacity. ABR is needed to cope with this varying network bandwidth.

The buffer size should be long enough to accommodate the quality switch before any glitch or rebuffering happens in the playback to ensure the ABR is working effectively.

Navigating a real-life example

Let’s consider the worst-case scenario. If the buffer size is two seconds, the segment is six seconds, the GOP size is three seconds, and the network bandwidth drops to half of the video bitrate near the end of the segment.

In this scenario, the player would need to download a new part from lower quality that starts with a keyframe. If the end of the segment and the GOP size is three seconds, neither the current nor the previous part contains a keyframe, so the player should download the third prior part to switch the quality down.

You need to download three seconds of data with only two seconds of buffer. Even if you reduce the GOP size to two seconds, you may still experience stalls during the ABR switch.

From there, you need to increase the buffer size to ensure a smooth quality switch. A larger buffer size means longer latency.

You may think of reducing GOP size to smaller values to have a proper ABR switch down without stalling. However, as discussed earlier, smaller GOP size comes with lower video quality and higher bandwidth consumption, bringing an extra challenge to the ABR.

The ideal setup for streaming with LL-HLS

It is crucial to clearly understand the impact of each parameter on the final result for low latency and fast startup streaming with LL-HLS.

The ideal setup is to use the lowest zapping time and latency and to keep the part and GOP size equal and as small as possible. A GOP size lower than one second does not make sense because of the poor video quality and high bandwidth needs. Therefore, a good value would be one second to achieve the lowest zapping time, latency, and smooth ABR switches with two two-second buffers (two parts).

However, the one-second GOP size could be demanding in terms of bandwidth consumption. Our recommendation is a GOP size of two seconds, with a part size of one second and a buffer size of three seconds, which is a good combination for reasonable video quality, bandwidth consumption, latency, and zapping time.

Final thoughts

Although LL-HLS provides relatively low latency streaming by nature, optimizing your encoder settings to achieve your desired latency is crucial. By applying the ideal encoder settings to your streaming setup, you’ll be on track to reduce your latency and provide a high-quality experience for your viewers.

Are you looking to achieve lower latency in your streams but unsure where to start? Dolby is here to help. Contact us today to learn how our technology can support you in reaching your streaming goals.

Stream on demand

Customer story

Streaming

How to Optimize LL-HLS for Low Latency Streaming

What is LL-HLS?

4 Key factors in low-latency streaming