Machine Vision For Test Automation — Part 2

If you missed it, read the first part here. Now, onwards.

Hotstar has always put a huge premium on building for Bharat. Building for Bharat starts with making content accessible in the language of choice for our customers. The platform has always been able to auto-start content in the language best suited for our customer, basis various factors, if the content supports multiple language dubs.

As part of our launch strategy for Disney+Hotstar, we brought the beloved Disney catalog in the language that resonates with our customers. Content is more than just the video, it is the sum of the video, subtitles, meta, dubbed languages and the artwork (posters). All of these should come together in uniformity to deliver a seamless experience to the user. Language hence became a primary engagement driver for Disney content for Bharat.

Validate that the dubbed language, artwork, video and meta of a given content is uniform per language. (in simple terms: A Tamil consumer should see Tamil in the poster so as to influence the customer to click and watch it)
Avoid user pain
– I see the poster in my language, however the language dub is missing
– Why am I shown a language poster which I don’t understand?
– Does this content have my language ?
– Why am I offered languages which I don’t speak ?

We had to verify this for the massive Disney content library for India which had thousands of titles. Sampling was the only way a set of humans would have been able to verify a small percentage of all contents. This would lead to user pain, where a Telugu user would possibly see a poster callout Telugu in it, but disappointment follows when the content doesn’t have Telugu dubbed audio.

For better understanding, have a look at the poster of Avengers: Endgame below.

Observe the poster , and the “Available in ” section below the image.

It is impossible for a Design Operations team to “ensure” that all contents have the required artwork attached, when we are talking about thousands of contents.
It is further impossible to rule out that a misconfiguration hasn’t happened before the launch where the wrong content and artwork are mapped.

The artwork is basically an image with the text embedded in it. In order to get close to validating whether the correct language text is showing in the artwork, we first had to extract it, then compare it with our base set of training data and get a response from our Image Validator service on the confidence level for a match.

We used the Image Validator service’s text extraction and image matching APIs. But it was not a straightforward solve due to the following reasons:

Some posters had languages call out in middle, some on left and some on right of the poster
Some language callouts were camouflaged by the background of the text (eg. bright or dark backgrounds, jazzy or colourful pictures)
Almost all language callout texts were in different colours (to help them stand out from their background colour)
Of course the language callout itself was in a Indian language like Tamil, Telugu etc. English > English text pattern matching is easier considerably.

Even our best optimisations resulted in a success rate of < 50% overall. Which meant that we had no guarantee that we were throwing false positives, much less fall negatives in our test results. Clearly, we weren’t succeeding.

We then decided to change our strategy and combine two AI features, text extraction & matching with pattern recognition and build a hybrid custom model of our own. To avoid over engineering it, we created two parallel threads which would converge at the point of verification — the combination of confidence level on each thread would be the overall confidence percentage.

This was great, but we still haven’t defeated our primary enemy — noise.

Reduce the noise please!

Step 1: We used an image processing feature to remove all the unwanted noise from the source image. We removed the edges, cropped pictures, resized them to be the same size as the baseline image to ease comparison.

Step 2: Removed colour — Made the image colourless (Black and White). Actually this was quite useful, once images go black/white they are sharper and text stands out clearer to work with. The contrast between black and white ensures the bots are not confused by soft color transitions and overlaps into the text which we were interested in.

Image generated after step 1 & 2

Step 3: Gathered different patterns and fed them as a training image. We ran tests and found out different patterns which were not recognised directly by our text extraction logic. Eg :-

A bot would be able to make a match only 50% of time to the above image. The weight of the font is clearly different enough for the bot to be confused on its confidence.

We hence bundled the patterns and fed them as a training image for pattern recognition to work.

Voila !

We were able to improve accuracy and bumped the success rate to ~99–100%. We tested and retested the test results dozens of times, including human verification on results to ensure there are no false positives or false negatives. To our great excitement, we had cracked it and for the first time at Hotstar, we had a working framework to trawl through thousands of titles and ensure the Artwork is what it is supposed to be.

This whole exercise gave us greater confidence to solve bigger problems in our content validation capabilities. In further blogs from my colleagues they will articulate how Poster validation has become a core part of our automated Content Factory validation, a journey which started from a completely different objective of LiveAds validation and now has taken a life of its own !

If you want to solve problems like these, then do check out open roles in our automation team at https://tech.hotstar.com.