Forget chatbots.AI agents are the future

This week, a startup called Cognition AI launched Demo An artificial intelligence program called Devin was shown performing work typically performed by highly paid software engineers. Chatbots like ChatGPT and Gemini can generate code, but Devin goes a step further and plans how to solve the problem, writes the code, and then tests and implements it.

Devin’s creators call it an “artificial intelligence software developer.” When asked to test the performance of Meta’s open source language model Llama 2 when accessed through the different companies hosting it, Devin developed a step-by-step plan for the project, generated the code needed to access the API and run the benchmarks, and created A website summarizing the results.

It’s always hard to review demos, but Cognition has shown Devin handling a variety of impressive tasks.it WOW investors and engineers On X, a large number of Endorsementeven inspired by Some meme— including some predicting that Devon will soon become responsible A wave of layoffs in the technology industry.

Devin is just the latest, most perfect example of a trend I’ve been tracking for some time – the emergence of artificial intelligence agents that no longer just provide answers or suggestions to questions posed by humans, but can take action to solve problems. A few months ago I tested Auto-GPT, an open source program that attempts to do useful work by performing actions on personal computers and the web. Recently I tested another program called vimGPT to see how the new AI model’s vision skills could help these agents browse the web more efficiently.

I’m very impressed with the experiments with these drugs. For now, however, like the language models that power them, they make a lot of mistakes. When a piece of software is taking action rather than just generating text, a mistake can mean complete failure with potentially costly or dangerous consequences. Narrowing down the tasks an agent can perform to a specific set of software engineering chores seems like a smart way to reduce error rates, but there are still many potential ways to fail.

It’s not just startups that are building AI agents.Earlier this week, I wrote about an agent called SIMA developed by Google DeepMind that can play video games, including truly crazy ones Goat Simulator 3. SIMA looked at how human players completed more than 600 fairly complex tasks, such as chopping down a tree or shooting an asteroid. Best of all, it can do many of them successfully, even in unfamiliar games. Google DeepMind calls it a “generalist.”

I suspect Google hopes these agents will eventually work outside of video games, perhaps helping use the web on behalf of users or operating software for them. But video games provide a good sandbox for developing and testing agents by providing complex environments in which agents can be tested and improved. “We’re actively working on making them more accurate,” Tim Harley, a research scientist at Google DeepMind, told me. “We have all kinds of ideas.”

You can expect more news about artificial intelligence agents in the coming months. Google DeepMind CEO Demis Hassabis recently told me that he plans to combine large language models with previous work his company has done on training AI programs to play video games to develop stronger, more reliable agents. “This is definitely a big area. We are investing heavily in this direction, and I think others are as well.” Hassabis said. “When these types of systems start to become more agent-like, that’s going to be a step change in their capabilities.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *