TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars

What is TaoAvatar?

TaoAvatar is an advanced technology that creates 3D full-body avatars for augmented reality. These avatars are generated from multiple camera views and can be fully controlled in terms of pose, gesture, and expression.

TaoAvatar offers high-quality, real-time rendering with minimal storage needs, making it suitable for mobile and AR devices like the Apple Vision Pro. This technology is particularly useful in areas such as e-commerce live streaming and holographic communication, providing realistic avatars that can operate smoothly on various devices.

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars

Overview

TaoAvatar generates photorealistic, topology-consistent 3D full-body avatars from multi-view sequences. These avatars are fully controllable in pose, gesture, and expression, providing high-quality, real-time rendering with low storage requirements. They are compatible across various mobile and AR devices, including the Apple Vision Pro.

Applications

Realistic 3D full-body talking avatars have significant potential in augmented reality, with applications ranging from e-commerce live streaming to holographic communication. Despite advances, existing methods often struggle with fine-grained control of facial expressions and body movements, and they lack sufficient detail for real-time mobile device operation.

Our Approach

We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our method starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, capturing high-frequency appearance details.

To make this resource-efficient for mobile devices, we "bake" the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

Demos and Datasets

Demos on the Apple Vision Pro showcase a 3D digital human agent interacting with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing natural responses with synchronized speech, expressions, and movements.

Our dataset, TalkBody4D, contains eight multi-view image sequences captured with 59 well-calibrated RGB cameras at 20 fps, with a resolution of 3000×4000 and lengths ranging from 800 to 1000 frames. To request the dataset, please visit our HuggingFace repository, complete the required login information, and submit the corresponding request form. We will perform dual verification of your identity and grant access permissions on HuggingFace after validation.

Key Features of TaoAvatar

Photorealistic 3D Avatars
TaoAvatar creates realistic, topology-consistent 3D full-body avatars from multi-view sequences, fully controllable in pose, gesture, and expression.
Real-Time Rendering
Delivers high-quality, real-time rendering with low storage needs, compatible with mobile and AR devices like the Apple Vision Pro.
Advanced 3D Gaussian Splatting
Utilizes 3D Gaussian Splatting for detailed avatar creation, overcoming challenges in facial expression and body movement control.
Efficient Mobile Performance
Employs a lightweight MLP-based network and blend shapes to ensure efficient performance on mobile devices.
Interactive Demos
Showcases interactive 3D digital human agents on devices like the Apple Vision Pro, with natural responses through synchronized speech and movements.
Comprehensive Dataset
Includes the TalkBody4D dataset with multi-view image sequences for evaluating and building animatable human body avatars.

Demos on Apple Vision Pro

1. AI Agent on the Vision Pro

We deployed a 3D digital human agent on the Apple Vision Pro, which interacts with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing the agent to respond naturally with synchronized speech, expressions, and movements.

2. TaoAvatars on the Vision Pro

TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

3. Relighting on the Vision Pro

Our method can obtain high-quality normal, which can be used for real-time image-based relighting.

Pros and Cons

Pros

Photorealistic avatars
Fully controllable
High-quality rendering
Device compatibility
90 FPS quality

Cons

Control challenges
Lacks details
Resource-intensive

How to Use TaoAvatar?

Step 1: Create a Parametric Template

Begin by creating a personalized clothed human parametric template to bind Gaussians for appearance representation.

Step 2: Pre-train StyleUnet Network

Pre-train a StyleUnet-based network to manage complex pose-dependent non-rigid deformations and capture high-frequency appearance details.

Step 3: Distill into MLP-based Network

Use a distillation technique to bake non-rigid deformations into a lightweight MLP-based network for efficient processing.

Step 4: Develop Blend Shapes

Create blend shapes to compensate for details and enhance the realism of the avatar's expressions and movements.

Step 5: Deploy on Devices

Deploy TaoAvatar on various devices, ensuring real-time performance and high-fidelity rendering, especially on platforms like the Apple Vision Pro.

TaoAvatars Case Show

TaoAvatars showcase various high-fidelity full-body avatar synthesis results generated by our method, highlighting the versatility and accuracy of our approach.

What is TaoAvatar?

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars

Overview

Applications

Our Approach

Demos and Datasets

Key Features of TaoAvatar

Photorealistic 3D Avatars

Real-Time Rendering

Advanced 3D Gaussian Splatting

Efficient Mobile Performance

Interactive Demos

Comprehensive Dataset

Demos on Apple Vision Pro

1. AI Agent on the Vision Pro

2. TaoAvatars on the Vision Pro

3. Relighting on the Vision Pro

Pros and Cons

Pros

Cons

How to Use TaoAvatar?

Step 1: Create a Parametric Template

Step 2: Pre-train StyleUnet Network

Step 3: Distill into MLP-based Network

Step 4: Develop Blend Shapes

Step 5: Deploy on Devices

TaoAvatars Case Show

TaoAvatar FAQs

What is TaoAvatar used for?

How does TaoAvatar achieve real-time performance?

What are the applications of TaoAvatar?

What devices are compatible with TaoAvatar?

How does TaoAvatar handle facial expressions and body movements?

Can TaoAvatar be used for relighting?

How can I access the TalkBody4D dataset?