SILO: American Family Funding Initiative short talks

On the Effectiveness of Dataset Alignment for Fake Image Detection

Anirudh Sundara Rajan, MS student, Computer Sciences

Abstract:
As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative model’s fingerprints while ignoring image properties such as semantic content, resolution, file format, etc. Fake image detectors are usually built in a data-driven way, where a model is trained to separate real from fake images. Existing works primarily investigate network architecture choices and training recipes. In this work, we argue that in addition to these algorithmic choices, we also require a well-aligned dataset of real/fake images to train a robust detector. For the family of LDMs, we propose a very simple way to achieve this: we reconstruct all the real images using the LDM’s autoencoder, without any denoising operation. We then train a model to separate these real images from their reconstructions. The fakes created this way are extremely similar to the real ones in almost every aspect (e.g., size, aspect ratio, semantic content), which forces the model to look for the LDM decoder’s artifacts. We empirically show that this way of creating aligned real/fake datasets, which also sidesteps the computationally expensive denoising process, helps in building a detector that focuses less on spurious correlations, something that a very popular existing method is susceptible to. Finally, to demonstrate just how effective the alignment in a dataset can be, we build a detector using images that are not natural objects, and present promising results. Overall, our work identifies the subtle but significant issues that arise when training a fake image detector and proposes a simple and inexpensive solution to address these problems.

Towards Reliable Offline Evaluation of Reinforcement Learning Agents through Abstraction

Brahma S. Pavse, PhD student, Computer Sciences

Abstract:
As practitioners seek to deploy reinforcement learning (RL) agents to real-world tasks, it is increasingly important that they can do so with confidence. One line of research that tackles this problem is offline policy evaluation (OPE). OPE algorithms aim to evaluate the performance of an RL agent without actually deploying it by using offline datasets generated by other agents. Such an evaluation would enable a practitioner to determine whether they should take the risk of deploying the agent or not. Unfortunately, current OPE algorithms are notorious for exhibiting high variance and instability in practice. In this presentation, I will describe our work on leveraging abstractions and representation learning to improve the accuracy of OPE algorithms in simulated robotic environments. Our results indicate that abstractions hold promise of scaling up OPE algorithms to real-world applications.

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Harit Vishwakarma, PhD student, Computer Sciences

Abstract:
Auto-labeling data using model confidence scores is a cost-effective solution for obtaining labeled data. However, standard training methods often produce overconfident or miscalibrated scores, resulting in errors or insufficient auto-labeling. We propose Colander, a principled framework that optimizes scores specifically for auto-labeling tasks. Extensive experiments demonstrate that Colander significantly outperforms several training-time and post-hoc calibration methods in auto-labeling.

November 27, 2024

12:30 pm (1h)

Discovery Building, Orchard View Room

Anirudh Sundara Rajan, Brahma S. Pavse, Harit Vishwakarma

Video