When there is insufficient data to learn a globally accurate model, successful learning can still be possible if we take into account the particular task for which the learned model will be employed. Learning a possibly incorrect or incomplete model, which performs well in the subsequent prediction (classication or decision making) tasks requires far fewer training examples than learning a complete model. My work instantiates this aspiration in several rich and complex data-driven systems including learning graphical models, online collaborative ltering, and active learning. In this talk, I dive into the problem of learning graphical models with this framework in mind. In the rst half of the talk, I look into learning tree-structured Ising models in which the learned model is used subsequently for prediction based on partial observations (given the realization of a subset of variables, predict the value of the remaining ones). The vast majority of previous work on learning graphical models aims to correctly recover the underlying graph structure (an impossible task in the data-constrained regime). I show that it is possible to eciently learn a tree model that gives accurate predictions even when there is insucient data to learn the correct structure. The second half of the talk is about speciation rate estimation in phylogenetic trees. This problem is essentially one of inferring features of the model (in this case, the speciation or extinction rate) from partial observations (the sequences at the leaves of the tree) of a latent tree model (phylogeny). I show that an incomplete and partially incorrect summary of the tree structure is enough to estimate the speciation rate with the minimax optimal dependence on the length of observed DNA sequences.
Orchard View Room