As we rolled out our encoding using VP9, we saw a bunch of improvements on video quality and bitrates (compared to H264), however, we also ran into challenges, which we will talk about.
We’re using libvpx to encode our videos. After a period of time, we found that some content can be leaner with similar perceptual quality, by adding some entropy preserving filters. With these changes, we got remarkable bitrate savings on some of our most popular shows.
We found that some of our VP9 encoded content didn’t perform well on some content with high motion scenes and dark scenes. So, we decided to pause VP9 encoding on content types that would exhibit these issues. We then found that on some specific contents, the manifest bandwidth of the 240p layer is higher than that of the 360p layer. Due to the above issues, we paused VP9 encoding and dug deeper to analyse and investigate them. Finally, we came up with solutions that made our VP9 encoding output better.
In this part, we would like to talk about 2 points that are not so frequently discussed on tech forums: the 2pass rate control method and multi-thread encoding speed.
Rate Control method
Similar to x264, VP9 has 1pass ABR, Constant Quality, 2pass ABR, and Constrained Quality rate-control methods.
CRF with bitrate caps is frequently used in x264 encoding. In VP9 CRF mode, the encoder tries to reach a constant (perceptual) quality while keeping the average bitrate below the bitrate limit.
This is different from the x264 CRF rate-control. In x264 we can use VBV buffer and VBV maxrate to control the bandwidth (maxrate) value of each layer in the DASH manifest. But in VP9 CRF mode there is no way to control this directly. The maxrate value is very important in adaptive bitrate selection. With a higher maxrate value, the stream will be picked up by fewer clients.
Another thing that is rarely noticed is that we can use 2pass in Constrained Quality mode encoding. As 1pass CRF is widely used in x264, we didn’t try 2pass CRF in VP9 at the beginning. However, 2pass CRF performs much better than 1pass CRF in VP9. It can improve the quality of some complex scenes. We will discuss the details later.
Multi-thread encoding speed
For VOD encoding, we tend to use a slow speed setting to get better quality and smaller size. In x264/x265, we can use 10 or more threads to speed up the encoding of 1080p videos. However, we can not utilize that many threads in libvpx. And the 1080p encoding speed is much slower than x264 under slow preset.
After some investigation, we knew that the maximum threads that libvpx can utilize are related to tiles. The max tiles are determined by the resolution. This table shows the max tiles for each resolution.
For 1080p contents, the video width is 1920 and the max tiles are only 4. Therefore the encoding time of 1080p is a bottleneck of our VOD service. Fortunately, the -row-mt option was introduced in libvpx v1.7 and the multi-thread encoding speed became faster than old versions. But for video content that requires a short release time, libvpx still could not meet our requirements and we need GOP level parallelization to improve it.
Bento4 or Shaka packager?
Bento4 is very popular in HLS/DASH packaging for H264/H265 contents. For VP9, we have one more choice: Shaka packager. According to the developers, Bento4 focused on all formats based on the ISO Base File Format standard and Webm was thought to be very different. Besides, some VP9 + AAC streams generated by Bento4 could not play well in our Chrome browsers. On the contrary, Shaka packager can cover all of our use cases. So we decided to use Shaka packager in VP9 packaging.
The Shaka Packager can output fMP4 DASH streams with VP9 + AAC codec and Webm DASH streams with VP9 + Opus codec. It supports AV1 + AAC and AV1 + Opus well too.
Shaka Packager also enables Dynamic MPD by default. It can significantly improve the speed of client downloading and CDN uploading, making our file management easier.
Webm or fMP4?
As mentioned above, we can use either Webm or fMP4 for VP9 videos. Unfortunately, Opus support in ISO-BMFF is still experimental according to the Shaka packager documentation. So we chose Webm container with VP9 + Opus codec. A few months after the launch, we profiled and found that the total traffic saving against H264 was lower than we expected.
After investigation, we found there is about 20–30kbps overhead in our Webm container. Take a 180p video for example, if the bitstreams of video and audio are around 100kbps, the size is about 102kbps after converted to fMP4 DASH format. But when we convert it to Webm DASH format, the size is about 120–130kbps. There is ~20kbps overhead in the Webm container and this is too big for low-resolution contents. So we decided to use the fMP4 container in the future.
Another advantage of using fMP4 container with VP9 + AAC codec is the easier maintenance of multiple video codecs. We can reuse the audio tracks in our H264 manifests and copy/link them to our VP9 manifests without re-encoding the audios. Every time we receive new language audio for one content, we only need to process it for one time (AAC, fMP4) and add the same track to multiple (video codec) manifests. We don’t need to consider generating a second audio codec/container.
Going back to the earlier problem, we found that in some of our manifests 360p bandwidth (maximum segment bitrate) value is lower than 240p. This confuses the player and leads to sub-optimal bitrate switching and playback experience. Additionally, we found that our VP9 content had a decline in image quality at some complex scenes. After a few experiments, we found that 2pass CRF performed much better than 1pass CRF on the above cases.
The following figure is our 2pass CRF vs 1pass CRF comparison on the same video. They use the same CRF values and bitrate caps. We can see that the manifest bandwidth values of the 1pass CRF outputs are around 350, 500, 450, 650, and 240p is an abnormal point. On the contrary, 2pass CRF outputs have monotonically increasing manifest bandwidth values which are reasonable.
We also calculated the VMAF values of 2pass CRF and 1pass CRF outputs. Here are the comparison results:
From the above data, we can see that 2pass CRF is more stable on quality output and performs better at difficult-to-encode frames.
People might say, well, we already have HEVC and AV1 codecs, why do we need VP9? In addition to cost savings, VP9 has at least the following advantages.
First of all, HEVC is not supported by chrome/chromium browsers while VP9 contents can be played with hardware acceleration on popular devices.
Secondly, HEVC and AV1 contents can’t play well on low-end Android devices. And for 1080p+ or grainy contents, VP9’s performance is close to HEVC. In certain conditions, VP9 performs even better than HEVC.
Finally, the current encoding cost of VP9 is much lower than AV1. Compared to AV1, VP9 can save the encoding time and computation cost a lot. For videos with a not high watch time, the cost increment of multi-bitrate AV1 encoding may be higher than the cost saving of overall traffic. VP9 could prove to be a better choice in such scenarios.
In this blog, we have taken you through our journey of launching VP9, the challenges we faced and solutions developed for improving the end user experience. As with everything we do at Hotstar, these learnings are being applied across other codecs, platforms and use cases.
Our team is always exploring new and innovative ways to continuously improve our performance and efficiency in all aspects of audio, video processing and delivery.
We’re always hiring smart engineers to work with. Check out open roles at tech.hotstar.com.