Google strikes back at OpenAI with “Project Astra” AI agent prototype

0
63


Enlarge / A video nonetheless of Venture Astra demo on the Google I/O convention keynote in Mountain View on Might 14, 2024.

Google

Simply at some point after OpenAI revealed GPT-4o, which it payments as having the ability to perceive what’s going down in a video feed and converse about it, Google introduced Project Astra, a analysis prototype that options comparable video comprehension capabilities. It was introduced by Google DeepMind CEO Demis Hassabis on Tuesday on the Google I/O convention keynote in Mountain View.

Hassabis referred to as Astra “a common agent useful in on a regular basis life.” Throughout an indication, the analysis mannequin showcased its capabilities by figuring out sound-producing objects, offering artistic alliterations, explaining code on a monitor, and finding misplaced gadgets. The AI assistant additionally exhibited its potential in wearable gadgets, reminiscent of sensible glasses, the place it may analyze diagrams, counsel enhancements, and generate witty responses to visible prompts.

Google says that Astra makes use of the digicam and microphone on a consumer’s machine to supply help in on a regular basis life. By constantly processing and encoding video frames and speech enter, Astra creates a timeline of occasions and caches the data for fast recall. The corporate says that this permits the AI to determine objects, reply questions, and bear in mind issues it has seen which can be not within the digicam’s body.

Venture Astra: Google’s imaginative and prescient for the way forward for AI assistants.

Whereas Venture Astra stays an early-stage function with no particular launch plans, Google has hinted that a few of these capabilities could also be built-in into merchandise just like the Gemini app later this yr (in a function referred to as “Gemini Dwell”), marking a big step ahead within the improvement of useful AI assistants. It is a stab at creating an agent with “company” that may “assume forward, cause and plan in your behalf,” within the phrases of Google CEO Sundar Pichai.

Elsewhere in Google AI: 2 million tokens

Throughout Google I/O, the corporate unveiled numerous AI-related bulletins, a few of which we might cowl in separate posts sooner or later. However for now, this is a fast overview.

On the high of the keynote, Pichai talked about an “improved” model of February’s Gemini 1.5 Pro (identical model quantity, oddly) that’s coming quickly. It should function a 2 million-token context window, which implies it may course of giant numbers of paperwork or lengthy stretches of encoded movies without delay. Tokens are fragments of information that AI language fashions use to course of data, and the context window determines the utmost variety of tokens an AI mannequin can course of without delay. At present, 1.5 Professional tops out at 1 million tokens (OpenAI’s GPT-4 Turbo has a 128,000 token window for comparability).

We requested AI researcher Simon Willison—who doesn’t work for Google however was featured in a promo video in the course of the keynote—what he considered the context window announcement. “Two million tokens is thrilling,” he replied through textual content whereas sitting within the keynote viewers. “Nevertheless it’s value protecting value in thoughts that $7 per million tokens means a single immediate may price you $14!” Google charges $7 per million enter tokens for 1.5 on prompts longer than 150K tokens by its API.

During the Google I/O 2024 keynote, Google said Gemini Advanced has the
Enlarge / In the course of the Google I/O 2024 keynote, Google stated Gemini Superior has the “longest context window on the earth” at 1 million tokens—quickly to be 2 million.

Google

And talking of tokens, Google introduced that its previously-announced 1-million token context window for Gemini 1.5 Professional is lastly coming to Gemini Advanced subscribers. Beforehand it was solely out there within the API.

Google additionally introduced a brand new AI mannequin referred to as Gemini 1.5 Flash, which it billed as a light-weight, sooner, and cheaper model of Gemini 1.5. “1.5 Flash is the most recent addition to the Gemini mannequin household and the quickest Gemini mannequin served within the API. It’s optimized for high-volume, high-frequency duties at scale,” says Google.

Willison had a touch upon Flash as nicely: “The brand new Gemini Flash mannequin is promising there, it is meant to supply as much as 2m tokens at a lower cost.” Flash prices $0.35 per million tokens on prompts as much as 128K tokens and $0.70 per million tokens for prompts longer than 128K. It is one-tenth the value of 1.5 Professional.

“35 cents per million tokens! That is the largest information of the day, IMO,” Willison advised us.

Google additionally introduced Gems, which seems to be its tackle OpenAI’s GPTs. Gems are customized roles for the Google Gemini chatbot that can play an element that you simply outline, permitting you to personalize Gemini in several methods. Google lists examples of potential Gems as “a fitness center buddy, sous chef, coding companion or artistic writing information.”

New Generative AI fashions

A screenshot of the Google Imagen 3 website.
Enlarge / A screenshot of the Google Imagen 3 web site.

Google

Additionally on the Google I/O keynote on Tuesday, Google introduced a number of new generative AI fashions for creating photographs, audio, and video. Imagen 3 is the most recent in its line of picture synthesis fashions, which Google says is its “highest high quality text-to-image mannequin, able to producing photographs with even higher element, richer lighting and fewer distracting artifacts than our earlier fashions.”

Google additionally confirmed off its Music AI Sandbox, which Google payments as “a set of AI instruments to remodel how music may be created.” It combines its YouTube music project with its Lyria AI music generator into instruments for musicians.

The corporate additionally introduced Google Veo, which is a text-to-video generator that creates 1080P movies from prompts in a top quality that appears to match OpenAI’s Sora. Google says it’s working with actor Donald Glover to create an AI-generated demonstration movie that can debut quickly. It is from Google’s first AI video generator, however it appears to be its most succesful to this point.

The pattern video above, offered by Google, used the immediate, “A lone cowboy rides his horse throughout an open plain at lovely sundown, delicate gentle, heat colours.”

Google says beginning at present, its new AI artistic instruments can be found to pick out creators in a non-public preview solely however that wait lists are open now for signups.



Source link