Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any pairs of corresponding images. Unsupervised image-to-image translation is an important and challenging problem in computer vision. For quantitative evaluations, we measure realism with user study and Fréchet inception distance, and measure diversity with the perceptual distance metric, Jensen–Shannon divergence, and number of statistically-different bins. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. There are two main challenges for this task: (1) lack of aligned training pairs and (2) multiple possible outputs from a single input image. Image-to-image translation aims to learn the mapping between two visual domains.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |