Systems | Information | Learning | Optimization
 

Learning to do Structured Inference in Natural Language Processing

Many tasks in natural language processing, computer vision, and computational biology involve predicting structured outputs. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. For example, neural machine translation systems use unstructured training of local factors followed by beam search for test-time inference. There have been several proposals for deep energy-based structured modeling, but they pose difficulties for learning and inference, preventing their widespread adoption. We focus in this talk on structured prediction energy networks (SPENs; Belanger & McCallum 2016), which use neural network architectures to define energy functions that can capture arbitrary dependencies among parts of structured outputs.

Prior work with SPENs used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradient descent with a neural network trained to approximate structured argmax inference. This “inference network” outputs continuous values that we treat as the output structure. We develop large-margin training objectives to jointly train deep energy functions and inference networks. The objectives resemble the alternating optimization framework of generative adversarial networks (GANs; Goodfellow et al. 2014): the inference network is analogous to the generator and the energy function is analogous to the discriminator. We present experimental results on several NLP tasks, including part-of-speech tagging, named entity recognition, and machine translation. Inference networks achieve a better speed/accuracy/search error trade-off than gradient descent, while also being faster than exact inference at similar accuracy levels. This increased efficiency allows us to experiment with deep, global energy terms, which further improve results.

April 15 @ 12:30
12:30 pm (1h)

Discovery Building, Orchard View Room

Kevin Gimpel