Abstract
Over the past year, language models have transitioned from chat interfaces to agentic systems like Claude Code or Codex. In this talk, I will give an overview of two projects to build and understand agentic models in the open. OpenThoughts is a dataset for training reasoning models via supervised fine-tuning. Based on our pipeline with over 1,000 experiments, we assemble a training set of 1M examples that yields state-of-the-art performance. Next I will cover Terminal-Bench, a benchmark for measuring agent performance in terminal environments which has become an industry standard and illustrates the growing capabilities of agentic models.
Bio
Ludwig Schmidt is an assistant professor at Stanford University and a member of the technical staff at Anthropic. Ludwig’s research interests revolve around the empirical foundations of machine learning, often with a focus on datasets, reliable generalization, and language models. Ludwig’s research group contributed to open source machine learning by creating OpenCLIP, LAION-5B, DCLM, OpenThoughts, and Terminal-Bench. Ludwig completed his PhD at MIT and was a postdoc at UC Berkeley. Ludwig’s research received best paper awards at ICML & NeurIPS, best paper finalist at CVPR, and the Sprowls dissertation award from MIT.
Orchard View Room
Ludwig Schmidt, Stanford University