Trading Inference-Time Compute for Adversarial Robustness
We’ve simplified, stabilized, and scaled continuous-time consistency models, achieving comparable sample quality to leading diffusion models, while using only two sampling steps.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
Advancing cost-efficient reasoning
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
Consistency models are a nascent family of generative models that can sample high quality data in one step without the need for adversarial training.
We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.