Loading...
Bias and Generalization in Deep Generative Models: An Empirical Study
Despite recent empirical success, the inductive bias of deep generative models is not well understood. In this paper we propose a framework to systematically investigate bias and generalization in deep generative models of images by probing the learning algorithm with carefully designed training datasets. We exactly characterize the learned distribution to study if/how the model generates novel features and novel combinations of existing features.
Shengjia Zhao*, Hongyu Ren*, Arianna Yuan, Jiaming Song, Noah Goodman, Stefano Ermon (NIPS'2018 spotlight)
Loading...
A Lagrangian Perspective on Latent Variable Generative Models
We show that a variety of latent variable generative modeling objectives many of them, including InfoGAN, ALI/BiGAN, ALICE, CycleGAN, VAE, beta-VAE, adversarial autoencoders, AVB, AS-VAE and InfoVAE, etc, are Lagrangian duals of the same primal optimization problem for fixed values of the Lagrange multipliers. We provide an exhaustive characterization of all the objectives that belong to this class, and propose a dual optimization method where we optimize model parameters as well as the Lagrangian multipliers. This method achieves Pareto near-optimal solutions in terms of optimizing primal objectives and modeling correct distributions.
Shengjia Zhao, Jiaming Song, Stefano Ermon (UAI'2018 oral) [arXiv]
Loading...
InfoVAE: Information Maximizing Variational Autoencoders
It has been previously observed that variational autoencoders tend to ignore the latent code when combined with a decoding distribution that is too flexible. We analyze the cause of this problem and propose a novel solution.
Shengjia Zhao, Jiaming Song, Stefano Ermon [arXiv]
Loading...
A-NICE-MC: Adversarial Training for MCMC
We propose A-NICE-MC, a novel method to train flexible parametric Markov chain kernels to produce samples with desired properties. We leverage flexible volume preserving flows to obtain parametric kernels for MCMC. Using a bootstrap approach, we show how to train efficient Markov chains to sample from a prescribed posterior distribution by iteratively improving the quality of both the model and the samples.
Jiaming Song, Shengjia Zhao, Stefano Ermon (NIPS'2017) [arXiv]
Loading...
Learning Hierarchical Features from Generative Models
We prove that certain classes of hierarchical latent variable models do not take advantage of the hierarchical structure when trained with existing variational methods, and provide some limitations on the kind of features existing models can learn. Finally we propose an alternative flat architecture that learns meaningful and disentangled features on natural images.
Shengjia Zhao, Jiaming Song, Stefano Ermon (ICML'2017) [arXiv] [code]
Loading...
Adaptive Concentration Inequalities for Sequential Decision Problems
We introduce Hoeffding-like concentration inequalities that hold for a random, adaptively chosen number of samples. Our inequalities are tight under natural assumptions and can greatly simplify the analysis of common sequential decision problems. In particular, we apply them to sequential hypothesis testing, best arm identification, and sorting. The resulting algorithms rival or exceed the state of the art both theoretically and empirically.
Shengjia Zhao, Enze Zhou, Ashish Sabharwal, Stefano Ermon (NIPS'2016) [pdf]
Loading...
Closing the Gap Between Short and Long XORs for Model Counting
We provide matching necessary and sufficient conditions on the required asymptotic length of the parity constraints for solving model counting problems. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs and empirically demonstrate its effectiveness.
Shengjia Zhao, Sorathan Chaturapruek, Ashish Sabharwal, Stefano Ermon (AAAI'2016) [arXiv]
Bias and Generalization in Deep Generative Models: An Empirical Study
Despite recent empirical success, the inductive bias of deep generative models is not well understood. In this paper we propose a framework to systematically investigate bias and generalization in deep generative models of images by probing the learning algorithm with carefully designed training datasets. We exactly characterize the learned distribution to study if/how the model generates novel features and novel combinations of existing features.
Shengjia Zhao*, Hongyu Ren*, Arianna Yuan, Jiaming Song, Noah Goodman, Stefano Ermon (NIPS'2018 spotlight)
Loading...
The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models
We show that a variety of latent variable generative modeling objectives many of them, including InfoGAN, ALI/BiGAN, ALICE, CycleGAN, VAE, beta-VAE, adversarial autoencoders, AVB, AS-VAE and InfoVAE, etc, are Lagrangian duals of the same primal optimization problem for fixed values of the Lagrange multipliers. We provide an exhaustive characterization of all the objectives that belong to this class, and propose a dual optimization method where we optimize model parameters as well as the Lagrangian multipliers. This method achieves Pareto near-optimal solutions in terms of optimizing primal objectives and modeling correct distributions.
Shengjia Zhao, Jiaming Song, Stefano Ermon (UAI'2018 oral) [arXiv]
Loading...
InfoVAE: Information Maximizing Variational Autoencoders
It has been previously observed that variational autoencoders tend to ignore the latent code when combined with a decoding distribution that is too flexible. We analyze the cause of this problem and propose a novel solution.
Shengjia Zhao, Jiaming Song, Stefano Ermon [arXiv]
Loading...
A-NICE-MC: Adversarial Training for MCMC
We propose A-NICE-MC, a novel method to train flexible parametric Markov chain kernels to produce samples with desired properties. We leverage flexible volume preserving flows to obtain parametric kernels for MCMC. Using a bootstrap approach, we show how to train efficient Markov chains to sample from a prescribed posterior distribution by iteratively improving the quality of both the model and the samples.
Jiaming Song, Shengjia Zhao, Stefano Ermon (NIPS'2017) [arXiv]
Loading...
Learning Hierarchical Features from Generative Models
We prove that certain classes of hierarchical latent variable models do not take advantage of the hierarchical structure when trained with existing variational methods, and provide some limitations on the kind of features existing models can learn. Finally we propose an alternative flat architecture that learns meaningful and disentangled features on natural images.
Shengjia Zhao, Jiaming Song, Stefano Ermon (ICML'2017) [arXiv] [code]
Loading...
Adaptive Concentration Inequalities for Sequential Decision Problems
We introduce Hoeffding-like concentration inequalities that hold for a random, adaptively chosen number of samples. Our inequalities are tight under natural assumptions and can greatly simplify the analysis of common sequential decision problems. In particular, we apply them to sequential hypothesis testing, best arm identification, and sorting. The resulting algorithms rival or exceed the state of the art both theoretically and empirically.
Shengjia Zhao, Enze Zhou, Ashish Sabharwal, Stefano Ermon (NIPS'2016)
Closing the Gap Between Short and Long XORs for Model Counting
We provide matching necessary and sufficient conditions on the required asymptotic length of the parity constraints for solving model counting problems. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs and empirically demonstrate its effectiveness.
Shengjia Zhao, Sorathan Chaturapruek, Ashish Sabharwal, Stefano Ermon (AAAI'2016) [arXiv]