We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...
On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves ...
French AI startup Mistral today launched Devstral 2, a new generation of its AI model designed for coding, as the company seeks to catch up to bigger AI labs like Anthropic and other coding-focused ...
A maximum-severity security flaw has been disclosed in React Server Components (RSC) that, if successfully exploited, could result in remote code execution. The vulnerability, tracked as ...
Physicist Paul Davies’s Quantum 2.0: The past, present and future of quantum physics ends on a beautiful note. “To be aware of the quantum world is to glimpse something of the majesty and elegance of ...
OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software ...
Cursor’s new Composer model, built for low-latency agentic coding, completes most iterations in under 30 seconds, according to Anysphere. Anysphere has introduced Cursor 2.0, an update to the AI ...
The vibe coding tool Cursor, from startup Anysphere, has introduced Composer, its first in-house, proprietary coding large language model (LLM) as part of its Cursor 2.0 platform update. Composer is ...
Diligent Robotics, which deploys mobile manipulation robots in hospitals, today unveiled plans for Moxi 2.0, the latest generation of its platform. The company said the launch builds on three years of ...
Have you ever felt bogged down by the complexity of modern software development, juggling sprawling codebases, tackling security vulnerabilities, or endlessly refining workflows? For developers, these ...
What if your code could think for itself, anticipating your next move, debugging with precision, and even automating entire workflows? With the release of Claude Code 2.0, this isn’t just a futuristic ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results