Orchard View Room – Page 24

Quantifying Ad Fraud – Contamination Estimation via Convex Relaxations

Identifying contamination in datasets is important in a wide variety of settings, including view and click fraud in online advertising. After a brief overview of digital ad fraud, I’ll describe a technique for estimating contamination in large, categorical datasets. The technique involves solving a series of convex programs, resulting in …

Covariance thresholding, kernel random matrices and sparse PCA

In Sparse Principal Component Analysis (PCA) we wish to reconstruct a low-rank matrix from noisy observations, under sparsity assumptions on the factors recovered. Johnstone and Lu (2004) formalized these assumptions in the ‘spiked covariance model’, wherein we observe $n$ i.i.d. samples from a $p$ dimensional Gaussian distribution $N(0, I + …

Internet Device Graphs

Digital adverting is arguably the largest and most ubiquitous application of machine learning. Learning algorithms pick the ads we see by inferring information about who we are and what we might buy. Graph datasets, due to their simplicity, play a central role in facilitating this inference. Internet Device Graphs are …

An Active Learning System with applications to Psychology Research

Today, machine learning is responsible for most of what we perceive as the personalization of the web: automatic recommendations for movies (Netflix) or music (Spotify,Last.fm), personalized search results based on your recent searches or email (Google), automatic credit card fraud detection (Chase), social network friend identification (Facebook,Linked-in), and, of course, …

Integrated Staffing and Scheduling for Service Systems via Stochastic Integer Programming

We consider the problem of determining server schedules in multi-class service systems under uncertainty in the customer volumes. Common practice in such systems is to first identify server staffing levels that meet the quality of service targets, and then determine schedules for the servers that cover these staffing requirements. We …

TDB

To $e$ or not to $e$ in Poisson Image Reconstruction and Minimax Rates for Poisson Inverse Problems with Physical Constraints

In photon-limited image reconstruction, observations can be modeled as y~Poisson(f), where f:=exp(g) is the intensity of interest and g is the log-intensity. Previous work in this area has considered applying regularizers such as the total variation semi-norm to either f or to g:=log(f). The former is less stable at very …

Estimation with Norm Regularization, with Applications to Climate Science

The talk will discuss recent advances in the analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, as well as application of such estimation to climate science. Analysis of estimation error for regularized problems needs to consider four aspects: the norm, the …

Fitting high-dimensional linear models by M-estimation: some surprising asymptotic phenomena

This talk reviews some recent work on (unpenalized) linear re- gression M-estimators in high-dimensions. Extending the seminal work of Peter Huber, Steve Portnoy and others to the setting where n, the number of ob- servations, is large and comparable to p, the number of predictors, we obtain updated results for …

Connecting the Dots in Protein Interaction Networks

Proteins can be thought of as the machines inside cells, participating in nearly all biological processes. By physically binding to and interacting with one another, proteins form pathways dedicated to specific functions and vast networks. Mapping these pathways is critical for understanding how their disruption causes disease, but biological experiments …