文本和图像指导3D化身的生成和操纵

论文标题

文本和图像指导3D化身的生成和操纵

Text and Image Guided 3D Avatar Generation and Manipulation

论文作者

Canfes, Zehranaz, Atasoy, M. Furkan, Dirik, Alara, Yanardag, Pinar

论文摘要

最近对潜在空间的操纵已成为生成模型领域中有趣的话题。最近的研究表明，潜在方向可用于操纵某些属性的图像。但是，控制3D生成模型的生成过程仍然是一个挑战。在这项工作中，我们提出了一种新颖的3D操纵方法，可以使用文本或基于图像的提示（例如“ Young Face”或“惊喜的脸”来操纵模型的形状和纹理。我们利用对比的语言图像预训练（剪辑）模型和旨在产生面部化身的预训练的3D GAN模型的力量，并创建完全可区分的渲染管道来操纵网格。更具体地说，我们的方法采用输入潜在代码并修改它，以使文本或图像提示符指定的目标属性存在或增强，而留下其他属性则在很大程度上不受影响。我们的方法每次操作只需要5分钟，我们通过广泛的结果和比较来证明我们的方法的有效性。

The manipulation of latent space has recently become an interesting topic in the field of generative models. Recent research shows that latent directions can be used to manipulate images towards certain attributes. However, controlling the generation process of 3D generative models remains a challenge. In this work, we propose a novel 3D manipulation method that can manipulate both the shape and texture of the model using text or image-based prompts such as 'a young face' or 'a surprised face'. We leverage the power of Contrastive Language-Image Pre-training (CLIP) model and a pre-trained 3D GAN model designed to generate face avatars, and create a fully differentiable rendering pipeline to manipulate meshes. More specifically, our method takes an input latent code and modifies it such that the target attribute specified by a text or image prompt is present or enhanced, while leaving other attributes largely unaffected. Our method requires only 5 minutes per manipulation, and we demonstrate the effectiveness of our approach with extensive results and comparisons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题