Biologically interpretable machine learning modeling for understanding functional genomics

Robust phenotype-genotype associations have been established for a number of human diseases including brain disorders (e.g., schizophrenia, bipolar disorder). However, understanding the cellular and molecular causes from genotype to phenotype remains elusive. To address this, recent scientific projects have generated large multi-omic datasets — e.g., the PsychENCODE consortium generated ~5,500 genotype, transcriptome, chromatin, and single-cell datasets from 1,866 individual brains. However, integrating these large-scale multi-omics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze and interpret multi-omics. In this talk, I will first introduce multiview learning—an emerging machine learning field—and envision its potentially powerful applications for understanding functional multi-omics. In particular, we have proposed a framework called multiview empirical risk minimization (MV-ERM) for learning multi-omics data heterogeneity and revealing cross-omics patterns. Second, I will introduce our recent multiview learning applications for understanding functional genomics. For example, we developed an interpretable deep neural network model embedding multi-omics and biological networks to predict brain disorders from genotype. By applying to the PsychENCODE multi-omics data, our model improves disorder prediction (by 6-fold compared to additive polygenic risk scores), highlights key genes for disorders, and allows imputation of missing omic information from genotype data alone.

April 22, 2020

1:50 pm (1h)

Discovery Building, Orchard View Room

Daifeng Wang