Bridged Variational Autoencoders for Joint Modeling of Images and Attributes

Published in IEEE Winter Conference on Applications of Computer Vision (WACV), 2020

Recommended citation: Ravindra Yadav, Ashish Sardana, Vinay P Namboodiri, Rajesh M Hegde. "Bridged Variational Autoencoders for Joint Modeling of Images and Attributes." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1479-1487 https://ieeexplore.ieee.org/abstract/document/9093565

Download paper here

Generative models have recently shown the ability to realistically generate data and model the distribution accurately. However, joint modeling of an image with the attribute that it is labeled with requires learning a cross modal correspondence between image and attribute data. Though the information present in a set of images and its attributes possesses completely different statistical properties altogether, there exists an inherent correspondence that is challenging to capture. Various models have aimed at capturing this correspondence either through joint modeling of a variational autoencoder or through separate encoder networks that are then concatenated. We present an alternative by proposing a bridged variational autoencoder that allows for learning cross-modal correspondence by incorporating cross-modal hallucination losses in the latent space. In comparison to the existing methods, we have found that by using a bridge connection in latent space we not only obtain better generation results, but also obtain highly parameter-efficient model which provide 40% reduction in training parameters for bimodal dataset and nearly 70% reduction for trimodal dataset. We validate the proposed method through comparison with state of the art methods and benchmarking on standard datasets.

Recommended citation: Ravindra Yadav, Ashish Sardana, Vinay P Namboodiri, Rajesh M Hegde. “Bridged Variational Autoencoders for Joint Modeling of Images and Attributes.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1479-1487