Abstract: When training machine learning systems, the most basic scenario consists of the learning algorithm operating on a fixed batch of data, provided in its entirety before training. However, there are a large number of applications where there lies a choice in which data points are selected for labeling, and where this choice can be made “on the fly” after each selected data point is labeled. In such interactive machine learning (IML) systems, it is possible to train a model with far fewer labels than would be required with random sampling. In this work, we identify and model query structures in IML to develop direct information maximization solutions as well as approximations that allow for computationally efficient query selection. To do so, we frame IML as a feedback communications problem and directly apply principles and tools from coding theory to design and analyze new interaction selection algorithms. First, we directly apply a recently developed feedback coding scheme to sequential human-computer interaction systems. We then identify simplifying query structures to develop approximate methods for efficient, informative query selection in interactive ordinal embedding construction and preference learning systems. Finally, we combine the direct application of feedback coding with approximate information maximization to design and analyze a general active learning algorithm, which we study in detail for logistic regression.
Orchard View Room, Virtual