SILO: Data for training and evaluating agents: OpenThoughts and Terminal-Bench
Abstract Over the past year, language models have transitioned from chat interfaces to agentic systems like Claude Code or Codex. In this talk, I will give an overview of two projects to build and understand agentic models in the open. OpenThoughts is a dataset for training reasoning models via supervised …