On the last day of OpenAI’s 12 days of shipmas, the ChatGPT maker announced OpenAI o3. The new model is OpenAI 01’s successor, but as you might have noticed the AI firm skipped o2, which would have been the more obvious moniker for the flagship reasoning model’s successor.
Day 12: Early evals for OpenAI o3 (yes, we skipped a number)https://t.co/iWXg9IGuZMDecember 20, 2024
A report by The Information suggests that the decision to skip o2 is tied to trademark issues, as it might create conflict with British telecom provider O2 in the foreseeable future. Alongside OpenAI o3, the AI firm announced o3-mini, a smaller version of the next-gen model designed to achieve specific tasks.
While the company shipped OpenAI o1 to broad availability this month, the preview version will be limited to safety researchers and available for sign-up later today. This could be part of OpenAI’s plan to fine-tune the model’s user experience and performance before shipping it to general availability.
Interestingly, OpenAI o3 features “incredible” coding capabilities per benchmarks shared. OpenAI o1 also features impressive coding capabilities to the extent that it aced OpenAI’s research engineer hiring interview for coding at a 90-100% rate. It’s also up to three times better at handling tasks and answering complex queries, according to ARC-AGI (a sophisticated benchmark used to determine a model’s capability to reason and solve complex tasks for the first time)
According to OpenAI CEO Sam Altman:
“We view this as the beginning of the next phase of AI. Where you can use these models to do increasingly complex tasks that require a lot of reasoning.”
Similarly, Google is trying to keep up with the AI train with its own reasoning model dubbed Gemini 2.0 Flash Thinking. Google CEO Sundar Pichai refers to the new model as the “most thoughtful model yet.” Reasoning models are increasingly becoming important as more organizations hop onto the AI train and incorporate the technology into their workflow. This is because they’ll be able to handle complex tasks and queries.