AI models still fight to debug their Microsoft

Openai is used to help AI models of anthropic and other best AI laboratories, programming tasks. Google CEO Sundar Pichai In October said 25% of the new code in the company is created by AI and Meta CEO Mark Zuckerberg expressed their ambitions Place AI coding models in the social media giant.

Again, some of the best models today are fighting to solve software errors that will not travel in an experienced Devs.

One New research Microsoft’s research shows that the R & D section of Microsoft is anthropics, including anthropics Claude 3.7 Sonnet and Openai O3-mini, A program named SWE-Bench Lite will not be able to debug many issues in the developmental effects. The results are a carving reminder though skinny pronunciation From companies such as OpenaiAI is still not suitable for human professionals in domains such as coding.

The research has tried nine different models such as a spinal bone for a “single-based agent”, including a number of adjustment tools, including a python debugger. The Agent was instructed to solve the broken tasks from SWE-Bench Lite, 300 programs.

According to co-authors, when equipped with a stronger and more recent model, their agent has seldom successfully completed more than half of its shame instructions. Claude 3.7 Sonnet has followed the highest average success rate (48.4%), then Openai O1 (30.2%) and O3-mini (22.1%).

Microsoft Ei Benchmark — A graph from the study. “Relative Growth” applies to Boost models equipped with debugging tools.Photo credits:Microsoft

Why performance performance? Some models struggled to understand how the existing discussion tools and how different means can help different matters. According to co-authors, the bigger problem was the shortage of information. Lack of data representing “consistent decision-making processes” – this is the traces of people who separate people – in the training information of the current models.

“We believe in a strong way that training or subtle regulation (models) can make them better interactive debjggs,” he wrote in his research. “However, this will require such model education, such as model training, such as model training, such as trajectory interacting with a debugger to collect the necessary information before correcting an error.”

The findings are not fully shocked. There are many studies honest The Creator of this Code, the AI tends to provide weaknesses and mistakes in areas such as the ability to understand programming logic. Devin’s final estimateThe popular AI coding tool found that only 20 can complete three programming tests.

However, Microsoft is one of the more detailed views in a sustainable problem area for models. Likely to moisturize investor’s enthusiasm For AI-powered assistant coding tools, but developers with any chances – and make them higher levels – think twice about managing AI coding show.

An increasing number of technological leaders argued the concept of AI to automate coding work. Microsoft co-founder Bill Gates He said he thought as a profession of programming It’s here to stay. So CEO Amjad Masad-I Replatti, Octa CEO Todd McKinnonand IBM CEO Arvind Krishna.

Source link

Leave a ReplyCancel Reply

Trending now