Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Xuanchen Li1*, Yuhao Cheng1*, Xingyu Ren1, Haozhe Jia2, Di Xu2, Wenhan Zhu1, Yichao Yan1†,
1MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
2Huawei Cloud Computing Technologies Co., Ltd
(*Equal contribution, Corresponding author)
Teaser

Example results of our Topo4D, the proposed 4D head capture framework. Our method can produce temporal topological consistent head meshes with high-fidelity 8K textures from calibrated multi-view videos. Moreover, our method can be applied to digital character retargeting and relighting applications.

Abstract

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures.

Video

Pipeline

Pipeline
Overall pipeline of our framework. (a) We initialize Gaussian attributes and establish topological correspondence with the startup mesh. (b) Take one frame as an example, geometry-related attributes in the Gaussian Mesh of the last frame are optimized by this frame under a set of topology-aware loss items. (c) We align the Gaussian surface with the rendering surface by Gaussian normal expansion to extract more precise meshes. (d) To learn pore-level detailed colors and generate ultra-high resolution texture, we build a dense mesh by densifying Gaussians in UV space.

Results

Reconstruction Results

Topo4D can generate dynamic temporal-consistent meshes and corresponding 8K texture maps with pore-level details from calibrated multi-view videos.

Topo4D can capture subtle facial changes and various extreme expressions, representing muscle tremors and dynamic wrinkles.

Comparisons

Geometry Comparison

Texture Comparison

Comparison on Multiface

BibTeX


@article{li2024topo4d,
  title={Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture},
  author={Xuanchen, Li and Yuhao, Cheng and Xingyu, Ren and Haozhe, Jia and Di, Xu and Wenhan, Zhu and Yichao, Yan},
  journal={arXiv preprint arXiv:2406.00440},
  year={2024}
}