3.1 Evaluation of O3 and O4-mini Figure 5: Case study of OpenAI o3’s long multimodal chain-of-thought, reaching the correct answer after 8 minutes and 13 seconds of reasoning.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results