There has been a lot of buzz around Google’s answer to OpenAI’s ChatGPT and Microsoft’s Copilot. Google Gemini has been the talk of AI in business, a long-promised, next-gen GenAI model available in three versions – Ultra, Pro, and Nano.
All Gemini models are natively multimodal, able to work with and use more than just words. These models are trained on multiple audio, image, and video formats, as well as a large set of codebases and text in different languages.
These developments have set Gemini apart from models like Google’s LaMDA, which was specially trained on text data and cannot generate or understand anything other. Because Gemini is multimodal, it can perform many multimodal tasks, from transcribing speech to captioning images and videos to generating artwork.
Past Google Issues
That is the promise from Google themselves, but following the misfire on delivery of the original Bard launch, and the heavy doctoring of their own Gemini capabilities video that ruffled a lot of feathers as a mere aspirational fluff piece – can we take Google on its word?
Assuming Gemini lives up to its hype, the full potential of each tier has interesting developments for AI in business.
Gemini Ultra can aid in areas like physics homework, solving problems step-by-step on a worksheet, and highlighting potential mistakes in filled-in answers. Ultra can be applied to tasks like identifying scientific papers relevant to specific problems and extracting information from papers.
Ultra technically supports image generation, but has yet to make its way to the productized version, potentially due to being more complex. Rather than feeding prompts into an image generator, Gemini outputs them natively.
Gemini Pro is an improvement over LaMDA in reasoning, planning, and understanding capabilities. Its initial version proved better than OpenAI’s GPT-3.5 on more complex and longer reasoning chains – yet struggled with mathematics that involved several digits, with some instances of bad reasoning and obvious mistakes.
Improvements
With the arrival of Gemini 1.5 Pro as a drop-in replacement, it has improved on many of its predecessors’ areas – such as the amount of data it can process. Gemini Nano is a substantially smaller version of the Pro and Ultra models and is efficient enough to run directly on some phones instead of connecting to a server.
So far, Nano powers several features on the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24. It is also in Gboard, Google’s keyboard app, powering a small feature called Smart Reply, which suggests the next thing you’ll want to say when messaging. The feature initially works on WhatsApp but will see more implementations over time.
So far, Google Gemini’s scores are marginally better than OpenAI’s corresponding models, yet they still have problems holding them back and impressions left over from the video launch and Bard. with initial impressions not being great around basic facts being wrong, translation struggles, and poor coding suggestions, it is too early to tell if Google Gemini will become Google’s most substantial product for AI in business.
As AI for the enterprise becomes the biggest talking point for 2024, many experts are taking part in the Enterprise Bi-Annual AI & Big Data London event on June 19. At this UK premier event, special guest speakers will explore best practices and strategies, and open up on new technologies and how they benefit business growth now and into the future for ethics of AI in business.