location: Discovery Building
Backward Feature Correction: How can Deep Learning performs Deep Learning
How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a …
Why some robust estimators are efficiently computable?
Recent advances of computational robust statistics have produced efficient estimators with provable near-optimal statistical guarantees for a variety of problems. These estimators often involve non-convex optimization, and it is not clear why these non-convex problems are efficiently solvable, but many classical non-convex formulations are not. We make an attempt to …
Biologically interpretable machine learning modeling for understanding functional genomics
Robust phenotype-genotype associations have been established for a number of human diseases including brain disorders (e.g., schizophrenia, bipolar disorder). However, understanding the cellular and molecular causes from genotype to phenotype remains elusive. To address this, recent scientific projects have generated large multi-omic datasets — e.g., the PsychENCODE consortium generated ~5,500 genotype, …
Learning to do Structured Inference in Natural Language Processing
Many tasks in natural language processing, computer vision, and computational biology involve predicting structured outputs. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. For example, neural machine translation systems use unstructured training of local factors followed by …
A function space view of overparameterized neural networks
Contrary to classical bias/variance trade-offs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of …
Multi-armed bandit problems with history-dependent rewards
The multi-armed bandit problem is a common sequential decision-making framework where at each time step a player selects an action and receives some reward from selecting that action. The aim is to select actions to maximize the total reward. Commonly it is assumed that the (expected) reward of each action …
Statistics meets computation: Trade-offs between interpretability and flexibility
Modeling and tractable computation form two fundamental but competing pillars of data science; indeed, fitting models to data is often computationally challenging in modern applications. At the same time, a “good” model is one that imposes the right kind of structure on the underlying data-generating process, and this involves trading …
Learning from Societal Data: Theory and Practice
Machine learning algorithms for policy and decision making are becoming ubiquitous. In many societal applications, the inferences we can draw are often severely limited not by the number of subjects in the data but rather by limited observations available for each subject. My research focuses on tackling these limitations both …
The surprising reasonableness of the earth movers distance in high dimensions
The earth mover’s distance (EMD) is a scalar measure of dissimilarity between histograms. Introduced over 200 years ago, the EMD has played a central role in linear programming, information retrieval, and is emerging as useful objective in machine learning. During the 1990’s, the EMD was generalized from a functional that acts …