Spatio-Temporal Signal Recovery from Social Media

Many real-world phenomena can be represented by a spatio-temporal signal: where, when, and how much. Social media is a tantalizing data source for those who wish to monitor such signals. Unlike most prior work, we assume that the target phenomenon is known and we are given a method to count its occurrences in social media. However, counting is plagued by sample bias, incomplete data, and, paradoxically, data scarcity issues inadequately addressed by prior work. We formulate signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatio-temporal regularization into the model to address the noisy count issues. We present an efficient optimization algorithm and discuss its theoretical properties. We show that our model is more accurate than commonly-used baselines. Finally, we present a case study on wildlife roadkill monitoring, where our model produces qualitatively convincing results.

Part 1 (Aniruddha): Introduction to Twitter related problems, previous work done in this area such as earthquake detection and then to our mathematical model of how tweets are produced. A more formal mathematical model is proposed which leads to a graph-regularized optimization problem. Finally, some new mathematical issues that arise from our initial work will be talked about.

Part 2 (Junming): The case study on roadkill and synthetic data. Data collection from twitter: natural language processing on tweet and how it can be used for our particular problem, Twitter API and collecting tweets automatically. Results and some future questions.

May 2 @ 12:30
12:30 pm (1h)

Discovery Building, Orchard View Room

Aniruddha Bhargava, Junming Xu