What is TaoAvatar?
TaoAvatar is an advanced technology that creates 3D full-body avatars for augmented reality. These avatars are generated from multiple camera views and can be fully controlled in terms of pose, gesture, and expression. TaoAvatar offers high-quality, real-time rendering with minimal storage needs, making it suitable for mobile and AR devices like the Apple Vision Pro. This technology is particularly useful in areas such as e-commerce live streaming and holographic communication, providing realistic avatars that can operate smoothly on various devices.
Overview of Taoavatar
Feature | Description |
---|---|
Taoavatar | Real-Time Lifelike Full-Body Talking Avatars |
Official Website | https://pixelai-team.github.io/TaoAvatar/ |
Research Paper | Google Drive Link |
Arxiv | https://arxiv.org/abs/2503.17032 |
Dataset | Hugging Face Link |
Technology Used | 3D Gaussian Splatting (3DGS), StyleUnet, MLP-based network |
Applications | E-commerce live streaming, holographic communication, AR devices |
Compatibility | Mobile and AR devices, including Apple Vision Pro |
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars
Overview
TaoAvatar generates photorealistic, topology-consistent 3D full-body avatars from multi-view sequences. These avatars are fully controllable in pose, gesture, and expression, providing high-quality, real-time rendering with low storage requirements. They are compatible across various mobile and AR devices, including the Apple Vision Pro.
Applications
Realistic 3D full-body talking avatars have significant potential in augmented reality, with applications ranging from e-commerce live streaming to holographic communication. Despite advances, existing methods often struggle with fine-grained control of facial expressions and body movements, and they lack sufficient detail for real-time mobile device operation.
Our Approach
We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our method starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, capturing high-frequency appearance details.

To make this resource-efficient for mobile devices, we "bake" the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.
Demos and Datasets
Demos on the Apple Vision Pro showcase a 3D digital human agent interacting with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing natural responses with synchronized speech, expressions, and movements.
Our dataset, TalkBody4D, contains eight multi-view image sequences captured with 59 well-calibrated RGB cameras at 20 fps, with a resolution of 3000×4000 and lengths ranging from 800 to 1000 frames. To request the dataset, please visit our HuggingFace repository, complete the required login information, and submit the corresponding request form. We will perform dual verification of your identity and grant access permissions on HuggingFace after validation.
Key Features of TaoAvatar
Photorealistic 3D Avatars
TaoAvatar creates realistic, topology-consistent 3D full-body avatars from multi-view sequences, fully controllable in pose, gesture, and expression.
Real-Time Rendering
Delivers high-quality, real-time rendering with low storage needs, compatible with mobile and AR devices like the Apple Vision Pro.
Advanced 3D Gaussian Splatting
Utilizes 3D Gaussian Splatting for detailed avatar creation, overcoming challenges in facial expression and body movement control.
Efficient Mobile Performance
Employs a lightweight MLP-based network and blend shapes to ensure efficient performance on mobile devices.
Interactive Demos
Showcases interactive 3D digital human agents on devices like the Apple Vision Pro, with natural responses through synchronized speech and movements.
Comprehensive Dataset
Includes the TalkBody4D dataset with multi-view image sequences for evaluating and building animatable human body avatars.
Demos on Apple Vision Pro
1. AI Agent on the Vision Pro
We deployed a 3D digital human agent on the Apple Vision Pro, which interacts with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing the agent to respond naturally with synchronized speech, expressions, and movements.
2. TaoAvatars on the Vision Pro
TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.
3. Relighting on the Vision Pro
Our method can obtain high-quality normal, which can be used for real-time image-based relighting.
Pros and Cons
Pros
- Photorealistic avatars
- Fully controllable
- High-quality rendering
- Device compatibility
- 90 FPS quality
Cons
- Control challenges
- Lacks details
- Resource-intensive
How to Use TaoAvatar?
Step 1: Create a Parametric Template
Begin by creating a personalized clothed human parametric template to bind Gaussians for appearance representation.
Step 2: Pre-train StyleUnet Network
Pre-train a StyleUnet-based network to manage complex pose-dependent non-rigid deformations and capture high-frequency appearance details.
Step 3: Distill into MLP-based Network
Use a distillation technique to bake non-rigid deformations into a lightweight MLP-based network for efficient processing.
Step 4: Develop Blend Shapes
Create blend shapes to compensate for details and enhance the realism of the avatar's expressions and movements.
Step 5: Deploy on Devices
Deploy TaoAvatar on various devices, ensuring real-time performance and high-fidelity rendering, especially on platforms like the Apple Vision Pro.
TaoAvatars Case Show
TaoAvatars showcase various high-fidelity full-body avatar synthesis results generated by our method, highlighting the versatility and accuracy of our approach.