The Ultimate Guide to Video Encoding: Everything You Need to Know

Video encoding is essential to any live streaming workflow since it converts bulky RAW video files into streamable digital files. Understanding how encoding works will help you add the right tools to your tech stack to create a high-quality viewing experience for your target audience.

In this guide, we will discuss all things video encoding. We’ll start with the basics of encoding and how it works before reviewing more technical elements. We will also compare the most common codec options on the market.

What is video encoding?

Video encoding is the process of compressing and changing the format of raw video content to a digital file or format, making the video content compatible with different devices and platforms.

The main goal of encoding is to compress the content to take up less space. This works by eliminating the information you don’t need. When the content is played back, it is played as an approximation of the original content.

Of course, when you get rid of content information, the video that is played back will be of lower quality than the original. The process of video encoding is imposed by codecs, which we will discuss later in this post.

Encoding vs. transcoding

Although the terms are often used interchangeably, it’s essential to note that encoding is not the same as transcoding.

Transcoding refers to the process of converting video formats from one to another. It is used to help ensure that the media from your source is compatible with the destinations you’re streaming to.

As we mentioned, the encoding process has more to do with the size of the file than the type of the file. Both processes are very important in the streaming process.

Why is encoding important in live streaming?

Video encoding is crucial because it enables broadcasters to transmit video content online. In live streaming, encoding is critical because compressing the raw video reduces the bandwidth, making it easier to get from the source to the video player while maintaining a high quality of experience for end viewers.

If video content could not be compressed, the available bandwidth on the Internet would be insufficient, preventing us from deploying widespread, distributed video playback services.

The fact that we can stream video on multiple devices in our homes, on-the-go using mobile, or even while video chatting with loved ones across the globe, even with low bandwidth, is owed to video encoding.

It’s also worth noting that encoding is essential for multi-birate and adaptive bitrate streaming. This means the video stream is available to end users in the size that is most suitable for the strength of the network they are streaming on, ensuring clear, uninterrupted playback.

How does video encoding work?

There are several layers to video encoding. It factors in motion, macro blocks, chroma subsampling, and quantanization. Here’s how these moving parts come together to encode video content successfully.

Motion Compensation

In video encoding, motion is significant. It’s often expressed in I frame (or keyframe), P frame, and B frame.

The keyframe stores the entire image. In the next frame or the P frame, when noticing not much has moved or changed, the P frame can refer to the previous keyframe, as only some pixels have moved. I, P, and B frames form groups of pictures (GOPs) together, and frames in such a group can only refer to each other, not frames outside.

For frames with little movement, this can save about 90% of the data storage that you would need to store a regular image (if it would use a lossless data compression like in PNG files), which are very big.

Macroblock

Within each frame, there are macroblocks. Each block has a specific size, color, and movement information. These blocks are encoded separately, which leads to proper parallelization.

Previously, codecs such as H.264 only allowed block sizes of 4×4 or 8×8 samples. However, newer codecs now allow for more block sizes and rectangular forms. Large blocks are used when little detail is needed, and using only a large block saves a lot of space compared to having many small blocks.

Macroblocks consist of multiple components. There are sub-blocks whose purpose is to give pixels colour information. Some sub-blocks provide the vector for motion compensation compared to the previous frame.

Due to this macroblock structure, in low bandwidth situations, sharp edges or “blockiness” can be visible within the video content. You can get around this by adding a filter that smooths out these edges. The “in-loop” filter is used in encoding and decoding to ensure the video content stays close to the source material.

Chroma Subsampling

In most cases, color is categorized in RGB channels. However, the human eye detects changes in brightness much more quickly than color changes, especially in moving images. Therefore, in video, it’s common to use a different color space called YCbCr.

This space is divided into:

Y (luminance or luma)
Cb (blue-difference chroma)
Cr (red-difference chroma)

In this space, “luma” means brightness, and “chroma” means color.

In chroma subsampling, we split images into their Y channel and their respective CbCr channel. For example, from an image, we take a grid of two rows with four pixels each (4×2). In the subsampling, we define a ratio as j:a:b.

j: amount of pixels sampled (i.e., width of grid)
a: amount of colors (CbCr) in the first row
b: amount of color (CbCr) changes in the second row compared to the first row

In streaming video, 4:4:4 is the full-colour space. Reducing it to 4:2:2 saves 30% of space, and reducing it further to 4:2:0 saves 50% of space. In video streaming (such as Netflix and Hulu TV shows), the most commonly and widely used format is 4:2:0. However, in video editing, 4:4:4 is the most common format.

Quantization

Audio encoding is another essential part of live streaming. Audio is a continuous analog signal, but you must digitize it for streaming. Once the audio is digitized, it’s split into multiple sinusoids, or sinus waves, each representing an audio frequency. To save space, you can discard frequencies that you do not need.

With an image, you can also see rows of pixels behind one another as one large signal. As with audio, you remove frequencies in the photo, known as frequency domain masking.

Removing frequencies does lead to a loss of detail, but you can remove a fair amount of frequencies without it being noticeable to the end viewer. This process is known as quantization.

What are Codecs?

Codecs are standards of video content compression. Codecs are made up of two components: an encoder to compress the content and a decoder to decompress the video content and play an approximation of the original content.

There are two types of codecs: lossy and lossless. Lossy codecs discard the excess data during the compression, so sometimes, there is a loss of quality once it’s decoded. With lossless codecs, the excess data is saved during compression, so it doesn’t take a hit when decoded.

You may assume that lossless codecs would be the standard since they don’t compromise the quality of your streams. However, many lossless codecs pose compatibility issues. Plus, since they maintain data, the bulkier file size can contribute to greater encoding costs.

What are hybrid codecs?

As we discuss codecs, it’s vital to mention hybrid codecs since they will likely play a role in the future of encoding. A hybrid codec is a codec that works on top of another codec. This can result in a 20-40% bitrate reduction.

The process usually follows these steps:

Take an input video
Use a proprietary downscaler on the video (e.g., use native technology to go from 1080p to 480p)
Encode downscaled video
Decode the video on the player
Upscale the decoded image using a proprietary upscaler

We will take a look at some specific examples of hybrid codecs later in this guide.

9 Popular video codecs

There are quite a few different video codecs you can use to encode your streams. Which you use will depend on your goals, audience, and budget.

With that in mind, let’s review some of the most popular video codecs available.

AV1

In 2015, Google, Mozilla (Xiph), and Cisco worked independently on creating video coding formats for their respective projects. However, they quickly realized the benefits of collaborating on one codec instead of making three separate codecs and dealing with the limitations of royalties. They joined forces to form the Alliance for Open Media (AOMedia), and the AV1 codec was born from this partnership.

Together, their goal was to attain 30% more efficiency than previously shown in VP9, and they also wanted to remain royalty-free. All AOMedia members offered their related patents to contribute to a patent defense program.

AV1 is currently one of the most popular codecs. It is open-source and highly efficient, making it an ideal option for many broadcasters. This codec’s only notable limitation is its compatibility issues with older versions of Apple operating systems.

H.264/AVC

H.264, also called AVC (Advanced Video Coding) or MPEG-4 AVC, was released in 2003. The codec was developed by MPEG and ITU-T VCEG, under a partnership known as the Joint Video Experts Team (JVET).

The biggest strength of encoding with H.264 is that it is ultra-compatible with most devices, operating systems, and browsers. This codec is a baseline for newer codecs and is relatively easy regarding royalty fees.

Although it’s still widely used, H.264 doesn’t have the hold on the streaming industry that it once did.

H.265/HEVC

H.265, also known as HEVC, which stands for High Efficiency Video Encoding, is a standard by MPEG and ITU-T VCEG (under a partnership known as JCT-VC). This codec was first standardized in 2013 and was continuously expanded for several years following its inception.

This codec was created to improve the content compression by 50% compared to H.264, all while keeping the same quality. According to a study by Netflix, this development was a success. The study showed improvements ranging from 35-53% when comparing it to H.264 and 17-21% when comparing it to VP9. Keep in mind that the encoder, content, and other factors matter a lot for these kinds of comparisons, but these improvements are significant.

These improvements were achieved by optimising the existing techniques in H.264. Essentially, H.265 compresses the content into smaller files than possible in H.264, and in return, decreases the required bandwidth that would be needed to play the video content.

Although this is all excellent news, H.265 is rarely used because of uncertainty around licensing and royalties.

VP9

VP9, the successor of VP8, was developed by On2Technologies, which is owned by Google. VP9 was standardized in 2013. This codec is similar to HEVC. However, no royalties are required.

Thanks to a 2020 iOS update, this codec is now widely compatible with all types of devices and browsers. It was previously incompatible with Apple operating systems, but iOS 14 eliminated that issue.

Although it is powerful, accessible, and compatible, VP9 has yet to be widely adopted.

H.266/VVC

H.266, also known as versatile video coding (VVC), is a new standard used in MPEG. It produces a 30-40% improvement in efficiency over HEVC.

Although VVC is powerful, it is not widely compatible. Many streaming platforms have yet to offer the support required to use this codec.

Once the rest of the streaming industry catches up, VVC should become a key codec for smart TV streaming, gaming, 360º video, and resolution switching in video.

EVC

Essential Video Coding (EVC) is another new standard by MPEG, that was built to achieve the same or similar efficiency as that of HEVC.

This standard is explicitly made for offline encoding for VOD and live OTT streaming and aims to be a “licensing-friendly” codec. Plus, EVC is suitable for real-time encoding and decoding.

EVC is still relatively new and has yet to be widely adopted.

LCEVC

As we mentioned, hybrid codecs are an important up-and-coming technology. LCEVC (previously known as PERSEUS Plus or P+) was created by V-Nova and is a hybrid codec actively used today.

Perseus has the advantage of saving in the scale of HEVC without redoing the whole encoding pipeline. It also has hardware decoding on iOS and Android.

Previously, using P+ came with significant challenges. However, with the rebrand to LCEVC, many of these challenges, including some compatibility issues, have been resolved.

ENHANCEplayer

ENHANCEplayer is an ongoing joint innovation project between Artomatix and THEO Technologies. The hybrid codec utilises Artomatix’s Artomatix Enhance AI to upscale the image resolution, remove compression artifacts, and remove the noise.

This essentially means “superresolution” generation with a neural network (NN) or an AI. The NN is trained to enhance the image and add lost details during compression. The goal is to reach 40ms to get a framerate of 25fps.

The biggest challenge is fidelity, meaning the image before compression is the same after enhancement, and it is still challenging to do while DRM is in play.

Audio codecs

For audio, AAC is seen as the de facto standard in the industry. AAC is supported everywhere and has the largest market share. Other audio codecs include: Opus, Flac, and Dolby Audio. Opus excels in voice and is also used by YouTube, seemingly the only large service using it, but still, it falls back to AAC.

Dolby Audio, also known as AC3, is sometimes still used for surround sound, as some older surround sound systems don’t play AAC.

The future of video encoding

The current state of codecs seems relatively simple: AAC for audio and H.264 are necessary for a broader reach. H.264 is a fallback for older devices. AV1 has potential but still has a way to go before it can be used in regular production.

So, in the cases where just one codec won’t suffice, what is the solution? The multi-codec approach is a must in these situations. For example, Twitch, today’s most popular video live streaming service, currently uses H.264. However, Twitch is also testing support for HEVC and AV1.

Another example is Netflix, which announced that it has started streaming AV1 on its Android app, boasting a 20% increased compression efficiency over VP9. However, it’s only enabled when CPU power is significantly cheaper than bandwidth, such as when a viewer is streaming over 4G. Netflix is aiming to use AV1 for all platforms in the future.

Final thoughts

Encoding your content is a crucial part of the livestream production process. Choosing an encoder that can support you in reaching your live streaming goals and works with your budget is critical.

Dolby supports streaming with a variety of encoders and different codecs. However, HEVC or AV1 are the most popular codec options within our solution. Contact us today if you’re ready to optimize your streaming setup to enhance your broadcasts.

Stream on demand

Customer story

Playback

The Ultimate Guide to Video Encoding: Everything You Need to Know