Systems | Information | Learning | Optimization
 

Backward Feature Correction: How can Deep Learning perform Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a …

Advances in Gradient Descent Methods for Non-Convex Optimization

With a flurry of recent research motivated by applications to machine learning, convergence of gradient descent methods for smooth non-convex unconstrained optimization is well understood in the centralized setting. In this talk I will discuss our progress towards understanding how convergence of gradient descent methods (including SGD and acceleration) is …

Backward Feature Correction: How can Deep Learning performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a …

Biologically interpretable machine learning modeling for understanding functional genomics

Robust phenotype-genotype associations have been established for a number of human diseases including brain disorders (e.g., schizophrenia, bipolar disorder). However, understanding the cellular and molecular causes from genotype to phenotype remains elusive. To address this, recent scientific projects have generated large multi-omic datasets — e.g., the PsychENCODE consortium generated ~5,500 genotype, …

Learning to do Structured Inference in Natural Language Processing

Many tasks in natural language processing, computer vision, and computational biology involve predicting structured outputs. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. For example, neural machine translation systems use unstructured training of local factors followed by …

A function space view of overparameterized neural networks

Contrary to classical bias/variance trade-offs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of …

Statistics meets computation: Trade-offs between interpretability and flexibility

Modeling and tractable computation form two fundamental but competing pillars of data science; indeed, fitting models to data is often computationally challenging in modern applications. At the same time, a “good” model is one that imposes the right kind of structure on the underlying data-generating process, and this involves trading …