Prior work with SPENs used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradient descent with a neural network trained to approximate structured argmax inference. This “inference network” outputs continuous values that we treat as the output structure. We develop large-margin training objectives to jointly train deep energy functions and inference networks. The objectives resemble the alternating optimization framework of generative adversarial networks (GANs; Goodfellow et al. 2014): the inference network is analogous to the generator and the energy function is analogous to the discriminator. We present experimental results on several NLP tasks, including part-of-speech tagging, named entity recognition, and machine translation. Inference networks achieve a better speed/accuracy/search error trade-off than gradient descent, while also being faster than exact inference at similar accuracy levels. This increased efficiency allows us to experiment with deep, global energy terms, which further improve results.

Discovery Building, Orchard View Room

Kevin Gimpel