VAEs for multimodal disentanglement


The Japanese Study Group on Computer Vision is a semestral conference in which researchers from all over Japan present the most recent topics in computer vision. In my talk, I introduce the variational autoencoders (VAE) as a way to learn latent representations of the input data, in particular, in the case of multimodal data (audio, video, text, etc.).