Systems | Information | Learning | Optimization
 

Backward Feature Correction: How can Deep Learning performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a …

Biologically interpretable machine learning modeling for understanding functional genomics

Robust phenotype-genotype associations have been established for a number of human diseases including brain disorders (e.g., schizophrenia, bipolar disorder). However, understanding the cellular and molecular causes from genotype to phenotype remains elusive. To address this, recent scientific projects have generated large multi-omic datasets — e.g., the PsychENCODE consortium generated ~5,500 genotype, …

Learning to do Structured Inference in Natural Language Processing

Many tasks in natural language processing, computer vision, and computational biology involve predicting structured outputs. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. For example, neural machine translation systems use unstructured training of local factors followed by …

A function space view of overparameterized neural networks

Contrary to classical bias/variance trade-offs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of …

Statistics meets computation: Trade-offs between interpretability and flexibility

Modeling and tractable computation form two fundamental but competing pillars of data science; indeed, fitting models to data is often computationally challenging in modern applications. At the same time, a “good” model is one that imposes the right kind of structure on the underlying data-generating process, and this involves trading …

The surprising reasonableness of the earth movers distance in high dimensions

The earth mover’s distance (EMD) is a scalar measure of dissimilarity between histograms.  Introduced over 200 years ago, the EMD has played a central role in linear programming, information retrieval, and is emerging as useful objective in machine learning.  During the 1990’s, the EMD was generalized from a functional that acts …