What is TaoAvatar?

TaoAvatar is an advanced technology that creates 3D full-body avatars for augmented reality. These avatars are generated from multiple camera views and can be fully controlled in terms of pose, gesture, and expression. TaoAvatar offers high-quality, real-time rendering with minimal storage needs, making it suitable for mobile and AR devices like the Apple Vision Pro. This technology is particularly useful in areas such as e-commerce live streaming and holographic communication, providing realistic avatars that can operate smoothly on various devices.

Overview of Taoavatar

FeatureDescription
TaoavatarReal-Time Lifelike Full-Body Talking Avatars
Official Websitehttps://pixelai-team.github.io/TaoAvatar/
Research PaperGoogle Drive Link
Arxivhttps://arxiv.org/abs/2503.17032
DatasetHugging Face Link
Technology Used3D Gaussian Splatting (3DGS), StyleUnet, MLP-based network
ApplicationsE-commerce live streaming, holographic communication, AR devices
CompatibilityMobile and AR devices, including Apple Vision Pro

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars

Overview

TaoAvatar generates photorealistic, topology-consistent 3D full-body avatars from multi-view sequences. These avatars are fully controllable in pose, gesture, and expression, providing high-quality, real-time rendering with low storage requirements. They are compatible across various mobile and AR devices, including the Apple Vision Pro.

Applications

Realistic 3D full-body talking avatars have significant potential in augmented reality, with applications ranging from e-commerce live streaming to holographic communication. Despite advances, existing methods often struggle with fine-grained control of facial expressions and body movements, and they lack sufficient detail for real-time mobile device operation.

Our Approach

We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our method starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, capturing high-frequency appearance details.

TaoAvatar

To make this resource-efficient for mobile devices, we "bake" the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

Demos and Datasets

Demos on the Apple Vision Pro showcase a 3D digital human agent interacting with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing natural responses with synchronized speech, expressions, and movements.

Our dataset, TalkBody4D, contains eight multi-view image sequences captured with 59 well-calibrated RGB cameras at 20 fps, with a resolution of 3000×4000 and lengths ranging from 800 to 1000 frames. To request the dataset, please visit our HuggingFace repository, complete the required login information, and submit the corresponding request form. We will perform dual verification of your identity and grant access permissions on HuggingFace after validation.

Key Features of TaoAvatar

  • Photorealistic 3D Avatars

    TaoAvatar creates realistic, topology-consistent 3D full-body avatars from multi-view sequences, fully controllable in pose, gesture, and expression.

  • Real-Time Rendering

    Delivers high-quality, real-time rendering with low storage needs, compatible with mobile and AR devices like the Apple Vision Pro.

  • Advanced 3D Gaussian Splatting

    Utilizes 3D Gaussian Splatting for detailed avatar creation, overcoming challenges in facial expression and body movement control.

  • Efficient Mobile Performance

    Employs a lightweight MLP-based network and blend shapes to ensure efficient performance on mobile devices.

  • Interactive Demos

    Showcases interactive 3D digital human agents on devices like the Apple Vision Pro, with natural responses through synchronized speech and movements.

  • Comprehensive Dataset

    Includes the TalkBody4D dataset with multi-view image sequences for evaluating and building animatable human body avatars.

Demos on Apple Vision Pro

1. AI Agent on the Vision Pro

We deployed a 3D digital human agent on the Apple Vision Pro, which interacts with users through an ASR-LLM-TTS pipeline. Facial expressions and gestures are dynamically controlled by an Audio2BS model, allowing the agent to respond naturally with synchronized speech, expressions, and movements.

2. TaoAvatars on the Vision Pro

TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.

3. Relighting on the Vision Pro

Our method can obtain high-quality normal, which can be used for real-time image-based relighting.

Pros and Cons

Pros

  • Photorealistic avatars
  • Fully controllable
  • High-quality rendering
  • Device compatibility
  • 90 FPS quality

Cons

  • Control challenges
  • Lacks details
  • Resource-intensive

How to Use TaoAvatar?

Step 1: Create a Parametric Template

Begin by creating a personalized clothed human parametric template to bind Gaussians for appearance representation.

Step 2: Pre-train StyleUnet Network

Pre-train a StyleUnet-based network to manage complex pose-dependent non-rigid deformations and capture high-frequency appearance details.

Step 3: Distill into MLP-based Network

Use a distillation technique to bake non-rigid deformations into a lightweight MLP-based network for efficient processing.

Step 4: Develop Blend Shapes

Create blend shapes to compensate for details and enhance the realism of the avatar's expressions and movements.

Step 5: Deploy on Devices

Deploy TaoAvatar on various devices, ensuring real-time performance and high-fidelity rendering, especially on platforms like the Apple Vision Pro.

TaoAvatars Case Show

TaoAvatars showcase various high-fidelity full-body avatar synthesis results generated by our method, highlighting the versatility and accuracy of our approach.

TaoAvatar FAQs