Systems | Information | Learning | Optimization
 

Quantifying Ad Fraud – Contamination Estimation via Convex Relaxations

Identifying contamination in datasets is important in a wide variety of settings, including view and click fraud in online advertising. After a brief overview of digital ad fraud, I’ll describe a technique for estimating contamination in large, categorical datasets. The technique involves solving a series of convex programs, resulting in …

Internet Device Graphs

Digital adverting is arguably the largest and most ubiquitous application of machine learning. Learning algorithms pick the ads we see by inferring information about who we are and what we might buy. Graph datasets, due to their simplicity, play a central role in facilitating this inference. Internet Device Graphs are …

An Active Learning System with applications to Psychology Research

Today, machine learning is responsible for most of what we perceive as the personalization of the web: automatic recommendations for movies (Netflix) or music (Spotify,Last.fm), personalized search results based on your recent searches or email (Google), automatic credit card fraud detection (Chase), social network friend identification (Facebook,Linked-in), and, of course, …

Integrated Staffing and Scheduling for Service Systems via Stochastic Integer Programming

We consider the problem of determining server schedules in multi-class service systems under uncertainty in the customer volumes. Common practice in such systems is to first identify server staffing levels that meet the quality of service targets, and then determine schedules for the servers that cover these staffing requirements. We …

To $e$ or not to $e$ in Poisson Image Reconstruction and Minimax Rates for Poisson Inverse Problems with Physical Constraints

In photon-limited image reconstruction, observations can be modeled as y~Poisson(f), where f:=exp(g) is the intensity of interest and g is the log-intensity. Previous work in this area has considered applying regularizers such as the total variation semi-norm to either f or to g:=log(f). The former is less stable at very …

Estimation with Norm Regularization, with Applications to Climate Science

The talk will discuss recent advances in the analysis of non-asymptotic estimation error and structured statistical recovery based on norm regularized regression, such as Lasso, as well as application of such estimation to climate science. Analysis of estimation error for regularized problems needs to consider four aspects: the norm, the …

Fitting high-dimensional linear models by M-estimation: some surprising asymptotic phenomena

This talk reviews some recent work on (unpenalized) linear re- gression M-estimators in high-dimensions. Extending the seminal work of Peter Huber, Steve Portnoy and others to the setting where n, the number of ob- servations, is large and comparable to p, the number of predictors, we obtain updated results for …