x
N A B I L . O R G
Close
AI - August 3, 2025

Exploring OpenAI’s Innovative Journey: Pushing the Boundaries of AI Capabilities to Serve You Better

Exploring OpenAI’s Innovative Journey: Pushing the Boundaries of AI Capabilities to Serve You Better

Following his appointment as a researcher at OpenAI in 2022, Hunter Lightman observed his colleagues unveil ChatGPT, a rapidly expanding product. Simultaneously, Lightman was discreetly working with a team, now known as MathGen, to enhance OpenAI’s models for solving high school math competitions.

Today, MathGen is recognized as instrumental in OpenAI’s leading efforts to create AI reasoning models – the core technology behind AI agents capable of performing computer tasks like humans do. Lightman explained that their early work aimed to improve the models’ mathematical reasoning abilities, which were initially limited.

Although OpenAI’s current AI systems still exhibit hallucinations and struggle with complex tasks, significant improvements have been made in mathematical reasoning. One of the company’s models recently won a gold medal at the International Math Olympiad, a prestigious competition for high school students. OpenAI anticipates that these capabilities will extend to other subjects, ultimately leading to general-purpose agents, a long-held aspiration for the company.

While ChatGPT was an accidental success – a low-key research preview turned viral consumer business – OpenAI’s agents are the result of years of deliberate in-house development. Sam Altman, CEO of OpenAI, expressed this vision at the company’s first developer conference in 2023: “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you.”

Whether this vision will materialize remains to be seen, but OpenAI made a significant stride with the release of its first AI reasoning model, o1, in the fall of 2024. The researchers behind that breakthrough have since become highly sought-after talent in Silicon Valley. Mark Zuckerberg recruited five of these researchers for Meta’s new superintelligence-focused unit, offering compensation packages exceeding $100 million.

The advancements in OpenAI’s reasoning models and agents are tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on the correctness of its choices in simulated environments. This technique has been used for decades, with notable applications such as AlphaGo, an AI system created by Google DeepMind using RL that beat a world champion in Go in 2016.

OpenAI’s first employee, Andrej Karpathy, began exploring the use of RL to create an AI agent capable of using a computer around this time, but it took years for OpenAI to develop the necessary models and training techniques. By 2018, OpenAI introduced its first large language model in the GPT series, which excelled at text processing, eventually leading to ChatGPT. However, these models struggled with basic math.

It wasn’t until 2023 that OpenAI achieved a breakthrough, initially named “Q*” and later “Strawberry,” by combining large language models (LLMs), reinforcement learning (RL), and test-time computation. This allowed for extra time and computing power to plan and work through problems before providing an answer, leading to the development of a new approach called “chain-of-thought” (CoT).

This approach improved AI’s performance on math questions it hadn’t encountered before. El Kishky, a researcher involved in this work, described witnessing the model “starting to reason.” It would notice mistakes and backtrack, showing signs of frustration. This experience felt reminiscent of reading a person’s thoughts.

Although individually these techniques were not novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly recognized the planning and fact-checking abilities of AI reasoning models as valuable for powering AI agents.

“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”

With AI reasoning models, OpenAI identified two new axes to improve its AI models: using more computational power during post-training and providing AI models with extra time and processing power while answering a question.

“OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” said Lightman.