Sam Nhut Nguyen

Contact Me

Audio-Driven Video Generation Talking Face Synthesis Lip Sync & Dubbing Diffusion Transformers Flow Matching

I'm a Senior Research Engineer at Pipio AI specializing in video and audio generation โ€” from audio-driven talking face synthesis to large-scale video dubbing pipelines. Previously at Captions AI, where I led development of Lipdub (video dubbing engine combining Landmark VAEs, Diffusion Transformers, and flow matching) and contributed to Mirage (10B-parameter audio-to-video generation). Co-founded dizim, a virtual presenter platform (Top 10 at Techfest 2022).

๐Ÿ“ Ho Chi Minh City, Vietnam

profile photo

What I've been working on.

Lip-sync, video dubbing, and talking head generation โ€” from research to production at scale.

Projects

8+ years building production generative AI systems for lip-sync, text-to-speech, and virtual presenters serving hundreds of thousands of users.

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos NEW
John Flynn, Wolfgang Paier, Dimitar Dinev, Sam Nhut Nguyen, Hayk Poghosyan, Manuel Toribio, Sandipan Banerjee, Guy Gafni
Pipio AI ยท Senior Research Engineer
ArXiv 2026
project page ยท arXiv

EditYourself is a diffusion-based video editing model for talking heads, enabling transcript-driven lip-syncing, insertion, removal and retiming of speech while preserving identity and visual fidelity.

Mirage Studio by Captions AI
Mirage Studio
Captions AI ยท Senior Member of Technical Staff

AI-powered video creation studio enabling users to generate cinematic-quality videos from text prompts with advanced motion and style control.

Lipdub by Captions AI
Lipdub
Captions AI ยท Senior Member of Technical Staff

Video dubbing engine that translates and lip-syncs videos into 28+ languages. Led development of the core lip-synchronization pipeline.

AI Creator 3D Avatar by Captions AI
AI Creator (3D Avatar)
Captions AI ยท Senior Member of Technical Staff

The world's first 3D avatar designed for content creation. Generates photorealistic talking-head videos from text with natural lip-sync, head movement, and emotional expression in 30+ languages.

AI Twin by Captions AI
AI Twin
Captions AI ยท Senior Member of Technical Staff

Digital clone technology that creates a virtual version of the user from a short recording. Generates talking-head videos from text in 29 languages with AI voice cloning and natural expressions.

Ausynclab AI Voice Clone
Ausynclab AI
Ausynclab ยท Technical Advisor

Voice cloning technology achieving best local Vietnamese voice quality. Reached 100k+ users within months of launch.

Dizim AI
Dizim AI
dizim ยท Co-Founder & Head of AI

Virtual presenter platform serving 200k+ users. Top 10 at Techfest 2022. Generates AI-driven talking head videos from text and slides.

Onlinica
Onlinica / OnliCV
Onlinica ยท Lead ML

First AI-powered online education platform in Vietnam. OnliCV connects your professional network. Led 8-person ML team building the core AI features.

DMSpro VisibilityPro
VisibilityPro
DMSpro ยท AI Engineer

Computer vision for retail automation. AI-powered shelf monitoring and product recognition that reduced manual review by 70%. Silver Winner SAP SME SEEDx 2020.

EyeQ Tech Face Detection
Face Detection & e-KYC
EyeQ Tech ยท AI Engineer

Facial recognition system for e-KYC and access control across Banking, Retail, and Security sectors. Real-time face detection and verification pipeline.

Bravesoft Vietnam OCR
Japanese OCR Engine
Bravesoft VN ยท AI Engineer

OCR engine specialized for complex Japanese Kanji/Kana character recognition. Built for document digitization workflows serving Japanese enterprise clients.

Open Source

My open-source work focuses on flow matching, audio/video generation, and lip synchronization. Based on my thesis research on talking face generation with GANs.

Articles

Interactive deep-dives into the concepts and architectures behind modern AI systems.

๐ŸŽฌ
EditYourself โ€” How It Works
Interactive Flow Matching Realtime Parallel Audio-to-Video Image-to-Video Text-to-Video

An interactive guide to EditYourself's audio-driven talking head pipeline โ€” selective masking, audio conditioning, sliding window denoising, and multi-scale refinement.

Read article โ†’
๐ŸŽ™๏ธ
Mirage โ€” Seeing Voices
Interactive Flow Matching Audio-to-Video

How Mirage generates complete A-roll video from audio โ€” asymmetric self-attention, learned RoPE, flow matching, and the data pipeline behind it.

Read article โ†’
๐Ÿ‘„
LipDub โ€” How It Works
Interactive Landmarks GAN Audio-to-Video Identity

An interactive guide to LipDub's two-stage talking face pipeline โ€” a Landmark VAE and Diffusion Transformer with flow matching for audio-driven landmark generation, then photo-realistic rendering with SPADE alignment and AdaIN audio injection.

Read article โ†’
๐Ÿง‘โ€๐ŸŽค
AI Creator โ€” 3D Talking Avatar
Interactive NeRF 3D Avatar Audio-to-Video Blendshapes Relighting

How to build a 3D talking avatar from a single video โ€” audio-visual lip sync, blendshape expressions, head pose optimization, hash-grid NeRF rendering, portrait restoration, and relighting from reference images.

Read article โ†’
๐Ÿ”„
Transformer + RoPE: Full Pipeline
Interactive Transformer

Step-by-step walkthrough of the full Transformer pipeline with Rotary Position Embeddings โ€” from raw text to output, with actual numbers and visualizations.

Read article โ†’
๐Ÿ“‰
Optimizer Evolution โ€” SGD to Muon
Interactive Optimization SGD Adam Muon

Why each optimizer was invented โ€” from SGD's zig-zagging to Adam's adaptive moments to Muon's orthogonalization. Interactive training simulations show how each optimizer navigates a loss surface differently.

Read article โ†’

Experience

๐Ÿ’ผ Work
  • Pipio AI - Senior Research Engineer (2025-Present): Multi-modal video synthesis, visual dialog editing, lip-synchronization
  • Captions AI - Senior Member of Technical Staff (2022-2025): Lipdub development, contributed to Mirage
  • dizim - Head of AI / Co-Founder (2021-2022): Top 10 Techfest 2022, virtual presenter platform serving 200k+ users
  • Onlinica - Lead Machine Learning (2022-2023): Led 8-person team for AI-powered educational platform
  • Ausynclab - Technical Advisor: Advised on voice cloning technology achieving best local Vietnamese voice quality, reached 100k+ users within months of launch
  • DMSpro - AI Engineer (2019-2020): Computer vision for retail automation, reduced manual review by 70%
  • EyeQ Tech - AI Engineer (2019): Facial recognition for e-KYC and access control across Banking, Retail, and Security sectors
  • Bravesoft Vietnam - AI Engineer (2018): Japanese OCR engine for complex Kanji/Kana character recognition
๐ŸŽ“ Education
  • BSc in Mathematics and Computer Science - VNU-HCMUS University of Science (2015-2019)
  • Thesis: "Talking Face via Audio Driven" - Score: 9.8/10
๐Ÿ† Honors
  • Semi-Final Techfest 2021 - Top 20 Startups
  • Silver Winner - SAP SME SEEDx Development Challenge 2020
๐Ÿ› ๏ธ Skills Large Scale Training, GANs, VAEs, Flow Matching, Diffusion, LLM, Kubernetes, Azure, PyTorch

Impact

Engineering contributions that directly supported fundraising, product launches, and business milestones.

2025 โ€” Present
Pipio AI Pipio AI โ€” Building Next Round In Progress
$6.75M raised to date ยท 4,000+ companies ยท 100K+ videos

Joined as Senior Research Engineer, contributing to research on EditYourself and multi-modal video synthesis. Building core product demos and technical vision to support the next fundraising milestone.

Role: Senior Research Engineer
Jul 2024
Captions AI Captions AI โ€” Series C
$60M raised ยท $500M valuation

Built core demos and technical presentations showcasing Lipdub and Mirage capabilities for the fundraising process. Led by Index Ventures, with Kleiner Perkins, a16z, and Sequoia returning.

Role: Senior Member of Technical Staff
Jun 2023
Captions AI Captions AI โ€” Series B
$25M raised ยท $250M valuation

Developed product demos and pitch materials for the Lipdub video dubbing engine. Led by Kleiner Perkins, with a16z and Sequoia participating.

Role: Senior Member of Technical Staff
2022
VTC Academy VTC Academy / Onlinica โ€” Fundraise
$20M raised

Led ML team building AI-powered e-learning features including virtual lecturers and voice cloning for the Onlinica platform, supporting the $20M fundraise for VTC Academy's digital education ecosystem.

Role: Lead Machine Learning
2022
dizim Dizim AI โ€” Techfest 2022 & Pre-Seed
Top 10 Techfest ยท $110K from Antler

Co-founded and built the core AI engine, demos, and pitch video for Techfest Vietnam 2022 (Top 10 nationally). Secured pre-seed funding from Antler, a global early-stage VC.

Role: Co-Founder & Head of AI
2024
Ausynclab Ausynclab AI โ€” Microsoft for Startups
Up to $150K in Azure credits

Advised on voice cloning technology and product development. Ausynclab was selected into the Microsoft for Startups program, reaching 100K+ users within months of launch.

Role: Technical Advisor