Google shows off a video-based conversational Gemini prototype

Hotstar in UAE
Hotstar in UAE

The next Google I/O 2024 conference will take place tomorrow. However, the company is eager to show the public some of its great advances in the field of AI. Ahead of the event, Google boasted a conversational Gemini prototype that responds in real-time to video.

AI-powered chatbots started by responding to written prompts. Sometime later, they gained the ability to recognize images. Since then, they have been able to answer questions or make comments about a particular image or element of an image. They can even generate new pictures from others. Now, the next big step seems to be related to video.

Google teases a conversational Gemini prototype using video before I/O 2024

Ahead of I/O 2024, Google is showing a short video of an interaction between Gemini and a user. The striking thing is that the entire interaction is based on video captured in real-time. The “teaser” shows how Gemini is able to recognize what is happening in the scene. It can also focus specifically on some elements of the scene, such as the Google I/O logo. Then, the AI-powered chatbot answers the user’s questions and even proposes new questions to “chat.”

The combination of real-time video recognition and conversational naturalness is quite impressive. However, it should be noted that what is shown is a prototype that seems functional. So, although the company will provide more details about it tomorrow, it is possible that a final version for mass use will take a little longer to be available.

The teaser could be a direct response to Open AI, the team behind ChatGPT. A few hours ago, the company held an event to announce new advances and features. One of the announcements was GPT-4o, a faster version of the GPT-4 model that is also capable of responding to live video. So, the timing chosen by Google to launch the teaser does not seem like a coincidence.

2024-05-14 15:06:49