Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Abstract

4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures.

Video

Pipeline

Overall pipeline of our framework. (a) We initialize Gaussian attributes and establish topological correspondence with the startup mesh. (b) Take one frame as an example, geometry-related attributes in the Gaussian Mesh of the last frame are optimized by this frame under a set of topology-aware loss items. (c) We align the Gaussian surface with the rendering surface by Gaussian normal expansion to extract more precise meshes. (d) To learn pore-level detailed colors and generate ultra-high resolution texture, we build a dense mesh by densifying Gaussians in UV space.

Results

Reconstruction Results

Topo4D can generate dynamic temporal-consistent meshes and corresponding 8K texture maps with pore-level details from calibrated multi-view videos.

Topo4D can capture subtle facial changes and various extreme expressions, representing muscle tremors and dynamic wrinkles.

Comparisons

Geometry Comparison

Texture Comparison

Comparison on Multiface

BibTeX


@inproceedings{li2024topo4d,
  title={Topo4D: Topology-Preserving Gaussian Splatting for High-fidelity 4D Head Capture},
  author={Li, Xuanchen and Cheng, Yuhao and Ren, Xingyu and Jia, Haozhe and Xu, Di and Zhu, Wenhan and Yan, Yichao},
  booktitle={European Conference on Computer Vision},
  pages={128--145},
  year={2024},
  organization={Springer}
}