About TaoAvatar

TaoAvatar generates photorealistic, topology-consistent 3D full-body avatars from multi-view sequences, fully controllable in pose, gesture, and expression. It provides high-quality, real-time rendering with low storage requirements, compatible across various mobile and AR devices like the Apple Vision Pro.

What is TaoAvatar?

Realistic 3D full-body talking avatars hold great potential in AR, with applications ranging from e-commerce live streaming to holographic communication. Despite advances in 3D Gaussian Splatting (3DGS) for lifelike avatar creation, existing methods struggle with fine-grained control of facial expressions and body movements in full-body talking tasks. Additionally, they often lack sufficient details and cannot run in real-time on mobile devices.

We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our approach starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We then pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, which can capture high-frequency appearance details but is too resource-intensive for mobile devices. To overcome this, we "bake" the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

Demos on Apple Vision Pro

AI Agent on the Vision Pro: We deployed a 3D digital human agent on the Apple Vision Pro, which interacts with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing the agent to respond naturally with synchronized speech, expressions, and movements.
TaoAvatars on the Vision Pro: TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.
Relighting on the Vision Pro: Our method can obtain high-quality normal, which can be used for real-time image-based relighting.

Datasets

Our dataset, TalkBody4D, contains eight multi-view image sequences. They are captured with 59 well-calibrated RGB cameras in 20 fps, with a resolution of 3000×4000 and lengths ranging from 800 to 1000 frames. We use the data to evaluate our method for building animatable human body avatars.

To request the dataset, please visit our HuggingFace repository, complete the required login information, and submit the corresponding request form. We will perform dual verification of your identity and grant access permissions on HuggingFace after validation.

Note: This is an unofficial about page for TaoAvatar. For the most accurate information, please refer to official documentation.