AutoCoder LLM: The New Open-Source AI Coding Assistant Outperforming GPT-4o

AutoCoder: The New Open-Source LLM for Coding Outshines GPT-4 Turbo and GPT-4o (Omni)

AutoCoder, a groundbreaking open-source language model for coding, has astonished the AI community by outperforming OpenAI’s GPT-4 Turbo and GPT-4o (Omni) in coding benchmarks.

AutoCoder LLM: The New Open-Source AI Coding Assistant Outperforming GPT-4o



Key Highlights:

  • AutoCoder, the latest AI-based coding assistant LLM, has surpassed GPT-4o in coding benchmarks.

  • It is built on the AIEV-Instruct Architecture, enabling seamless downloading of coding dependencies without additional assistance.

  • The model is open-source and can be run locally.

Just days after the launch of GPT-4o, which impressed the generative AI world with its natural language processing and advanced vision capabilities, AutoCoder has emerged as a formidable competitor. GPT-4o gained popularity among developers for its ability to develop AI systems, run and debug complex codes, and more.

Now, AutoCoder has taken the spotlight by surpassing GPT-4o in coding benchmarks, causing a sensation in the rapidly evolving generative AI industry.

In this article, we will delve into AutoCoder’s innovative open-source technology and thoroughly examine this cutting-edge coding assistant. Let's dive in!



Introducing AutoCoder: The Cutting-Edge AI Coding Assistant

AutoCoder: A New LLM Surpasses GPT-4 Turbo and GPT-4o Omni in Benchmark Tests

AutoCoder, a cutting-edge large language model, has outperformed GPT-4 Turbo and GPT-4o Omni in benchmark tests. Its versatility is enhanced by an adaptable code translator that allows for the seamless installation of external packages.



AutoCoder is available in two versions: 6.7 billion and 33 billion parameters. The 33 billion version has excelled in the Human Eval Benchmark, surpassing both GPT-4 Turbo and GPT-4o.

With its ability to automatically install necessary packages and enable selective interpreter use, AutoCoder outshines other models. This feature is highly sought after by developers who need a coding assistant that can handle all dependencies without additional instructions, significantly saving time on complex coding tasks and development projects.

AutoCoder employs a unique training methodology called AIEV Instruct (Instruction Tuning with Agent-Interaction and Execution-Verified). This approach combines agent interactions with external code execution and verification, pushing the boundaries of coding potential.

This allows AutoCoder to automatically install the necessary packages and execute the code, ensuring there are no issues. The open code interpreter provides freedom in code verification by running all generated Python code without user input.

Overall, AutoCoder exemplifies an AI agent that autonomously handles coding tasks, requiring no user assistance.



How can you Access it?

AutoCoder is fully open-source and available to everyone. You can download its model weights from Hugging Face and run it locally using LM Studio.

Watch the video below to learn how to set up AutoCoder on your local machine!



AIEV-Instruct: The Foundation of AutoCoder's Architecture


AutoCoder's architecture is founded on AIEV-Instruct, an innovative approach for generating high-quality large code datasets. By simulating programmers through agent interactions, it develops code, runs unit tests, and ensures accurate documentation, all while integrating with an external code executor.

"AIEV-Instruct stands out from previous large-scale code dataset generation methods by reducing reliance on proprietary large models and offering execution-validated code datasets."

The architecture includes a Teaching Stage and a Self-Learning Stage, which minimize the reliance on costly closed-source models for annotation. Let's explore these two stages in detail.



The Teaching Stage

During the Teaching Stage, GPT-4 Turbo acts as the teacher model to enhance and correct the open-source code snippets used in the model. This stage consists of four phases:


1. Initialization
   
GPT-4 Turbo takes on two roles: programmer and questioner. The goal is to ensure diversity in the generated data to avoid converging on a specific discussion template and produce a uniform probability distribution. The dialogue messages start as an empty list to hold data throughout the process. Multiple rounds of dialogue will be recorded on this list, each eventually forming a single data entry in the final dataset.


2. Proposing the Question
   
GPT-4 Turbo is used to run OSS-Instruct. It generates a problem description, a targeted solution, and a code snippet based on the open-source code fragment. GPT-4 Turbo also provides unit tests to ensure the code's accuracy. These elements are appended to the dialogue messages initialized in the first phase.


3. Execution Feedback
   
Multiple rounds of execution feedback are employed to verify the generated code, improving the dataset’s quality. Researchers input the code snippet from the second phase into the Code Interpreter. Any execution errors (Stderr output) are appended to the dialogue messages. The questioner then uses this Stderr data to create a natural language description, prompting the programmer to refine the code further. This iterative process continues, with new code and feedback being added to the dialogue messages.


4. Termination
   
Finally, the Code Interpreter runs the programmer's final code. If the program executes successfully, the Stdout is appended to the dialogue messages, completing one data entry's analysis.



The Self-Learning Stage

In the Self-Learning Stage, AutoCoder hones its comprehension through ongoing interaction and feedback.

Distinguishing itself from the Teaching Stage, the Self-Learning Stage sees the original teacher model replaced by a student model. Here, the entire execution feedback process is overseen by the student model, acting as both questioner and coder.

This approach guarantees that AutoCoder produces precise and dependable code, distinguishing it from other models available.



Remarkable Benchmark Performance

AutoCoder has stunned the world with its exceptional benchmark results.

Trials have revealed AutoCoder's outstanding performance in Java, C++, and Rust, achieving Pass@1 scores of 61.4%, 68.9%, and 60.8%, respectively. While it was outperformed by a few models like CodeQwen1.5-Chat in three other languages, AutoCoder's prowess in multilingual code creation remains unmatched.

In the HumanEval benchmark, AutoCoder surpasses OpenAI's GPT-4 Turbo and GPT-4 Omni, delivering exceptional results in coding tasks. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024), boasting 90.2% accuracy compared to GPT-4 Turbo's 90.9%.

Thanks to its superior efficiency, AutoCoder stands as an invaluable tool for developers seeking open-source, reliable solutions for their coding needs, coupled with advanced code interpretation capabilities.



What Sets AutoCoder Apart?

The Code Interpreter serves as a crucial component of modern coding assistants, enabling them to debug and execute code seamlessly. It plays a vital role in automating tasks related to scientific computations, complex coding, and similar duties.

However, identifying the relevant code blocks for execution remains a challenge for most AI models, with only a select few, like GPT-4o and InternLM-Chat, currently supporting such functionality.

These interpreters face limitations in communicating with external systems and operate within closed environments, hindering their ability to run code requiring external package installation.

Here's where AutoCoder shines: its capability to install dependencies without explicit instructions. This is achieved through training the model to execute commands using bash scripts when necessary.

For instance, in a basic feedback loop, the original data input comprises three components: user's natural language, assistant's response containing natural language, bash command, code block, and natural language, and the execution outcome from the code interpreter.



Conclusion

AutoCoder is revolutionizing the landscape of AI coding assistants. Its groundbreaking features, unique training methodology, and outstanding performance in benchmark tests make it a standout choice for developers and businesses seeking a powerful open-source solution. Offering the ability to install external packages, generate accurate code, and adapt through ongoing feedback and interaction, AutoCoder has the potential to reshape our approach to coding tasks fundamentally.