Trading Inference-Time Compute for Adversarial Robustness
We’ve simplified, stabilized, and scaled continuous-time consistency models, achieving comparable sample quality to leading diffusion models, while using only two sampling steps.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
Advancing cost-efficient reasoning
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation.
Consistency models are a nascent family of generative models that can sample high quality data in one step without the need for adversarial training.
We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.