How AI learned a complex coding language nobody taught it

Artificial intelligence (AI) usually learns programming languages by studying massive amounts of data. It reads millions of examples, slowly recognizing patterns the way a student studies textbooks. But what happens when the language is rare and there are very few examples to learn from?

A group of researchers tried something simple but surprisingly powerful. They kept giving the AI its own compiler error messages and asked it to fix its mistakes again and again. Slowly, the model began to understand where it went wrong.

At first, the AI solved only 39 percent of the problems. But after repeating this correction loop many times, the success rate jumped to 96 percent.

The result was exciting. It suggested that AI can go far beyond what it originally learned during training—especially when it receives clear feedback about exactly what went wrong.

Where the system stumbled

The experiment used programming exercises written in Idris, a language known for being strict and unforgiving.

Right away, the gap between the AI’s answers and what Idris required became obvious. The programs looked close to correct, but the compiler kept complaining.

Minda Li from the University of Southern California (USC) Viterbi School of Engineering decided to capture every error message the compiler produced. Instead of ignoring those messages, she sent them back to the AI and asked it to revise its code.

Each time the AI tried again, it faced the exact rule it had broken before. Slowly, failed attempts turned into working programs.

Watching this process unfold was fascinating. It showed that the real breakthrough didn’t come from feeding the AI more data. Instead, progress came from giving it clear, precise signals about its mistakes.

Picking a difficult language

Bhaskar Krishnamachari, a professor of electrical engineering and computer science at USC, deliberately chose a language that would challenge the AI.

Idris was perfect for this test. It uses a system called dependent types, where types can carry extra rules. If a programmer makes even a small mistake, Idris catches it immediately.

And unlike popular languages, Idris doesn’t have a huge amount of code available online. The study found only about 2,275 public Idris repositories, while Python has more than 24 million.

Another interesting detail: neither researcher actually knew Idris themselves. That meant they weren’t coaching the AI step by step. They simply watched to see whether the system could improve using feedback alone.

The model hits a wall

When the team tested the AI on shared coding exercises, the results told a clear story.

The model solved 90 percent of Python problems and 74 percent of Erlang problems. But in Idris, it managed only 39 percent.

At first, that looked disappointing. But the reason soon became clear.

Idris is extremely strict. Small issues—like missing variable names, incomplete cases, or mismatched types—can cause the program to fail immediately. The AI often got stuck on these early rules before it could even reach the deeper logic of the problem.

Li first tried softer methods to help the model. She gave it platform feedback, created a custom error guide, and even provided official Idris documentation.

These efforts helped a little, pushing the success rate to 61 percent, but the AI still struggled.

The lesson was surprisingly clear: general advice wasn’t enough. The AI needed specific feedback tied directly to the exact mistake in each program.

AI fixed mistakes with feedback

The breakthrough came with a simple loop.

Li ran the AI’s code locally so the compiler—the software that checks and translates code—could point out the exact lines that failed. She then sent those error messages back to GPT-5 and asked it to fix the problems.

Sometimes this cycle repeated up to 20 times.

Before running the full experiment, Li expected only a small improvement. Instead, the result shocked her.

The success rate jumped to 96 percent.

“I was surprised that just that alone—something so simple like recompiling and trying again—was able to reach 96 percent,” Li said.

The reason made sense. Compiler errors don’t give vague advice; they show the precise problem. In early tests, missing names appeared 123 times, and Idris often couldn’t tell what the programmer intended.

Manuals can warn about common mistakes. But only the compiler can say exactly what failed in that specific program.

“Our AI tools are now able to transcend their initial training,” Krishnamachari said.

Beyond software alone

This kind of feedback loop isn’t limited to programming.

Many fields work in similar ways. In mathematics, legal reasoning, or other rule-heavy areas, mistakes can often be clearly identified.

If AI systems receive that kind of precise feedback, they could fix their work before a human even sees the first draft.

Researchers believe the method could also help with low-resource human languages, where there simply isn’t enough written material to train large AI models.

But one thing is crucial: the feedback must be clear, accurate, and connected to the exact mistake.

Not every problem was solved

Even with this powerful method, the system wasn’t perfect.

Two Idris problems remained unsolved even after 20 rounds of corrections. Sometimes fixing one issue created another, showing that repeated adjustments can still lead to confusion.

The researchers also pointed out another possibility. Some Idris examples might already exist somewhere in the AI’s training data.

If that’s true, part of the model’s success might come from remembering patterns rather than learning entirely from feedback.

To test this properly, future experiments would need brand-new problems created after the model’s training.

Teaching AI to remember

Looking ahead, Li hopes to make the system smarter about learning from its past mistakes.

Right now, the AI often starts each problem from scratch, repeating the same kinds of errors again and again.

A model that remembers previous fixes could avoid those traps. It would need fewer retries and use less computing power.

That shift would turn a clever correction loop into something closer to real learning.

In the end, the research hints at an encouraging idea. AI might not always be limited by missing data. Sometimes, what it really needs is clear feedback and the chance to try again. And with the right signals, it can improve far more than we might expect.