Sam Nhut Nguyen

Contact Me

Creator-Studio Foundation Models Talking Face Synthesis Relighting 3D Scene Synthesis Lip Sync & Dubbing Diffusion Transformers Flow Matching

I'm a Research Scientist at Riverside (formerly Pipio AI), building a foundation model that unifies relighting, talking-face synthesis, 3D scene synthesis, lip-sync, and dubbing for the creator studio. Previously at Captions AI, I led Lipdub and contributed to Mirage, their 10B-parameter audio-to-video model. Co-founded dizim (Top 10, Techfest 2022) and built Ausynclab on the side — an AI audio editing app that grew to 150k+ users.

📍 Ho Chi Minh City, Vietnam

⚡ Currently: building creator-studio foundation models @ Riverside · open to research collaborations

profile photo

What I've been working on.

Building a foundation model for the creator studio — relighting, talking-face synthesis, 3D scene synthesis, lip-sync, and dubbing, in one video stack.

Projects

8+ years building production generative AI systems for lip-sync, text-to-speech, and virtual presenters serving hundreds of thousands of users.

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos NEW
John Flynn, Wolfgang Paier, Dimitar Dinev, Sam Nhut Nguyen, Hayk Poghosyan, Manuel Toribio, Sandipan Banerjee, Guy Gafni
Riverside (formerly Pipio AI) · Research Scientist
ArXiv 2026
project page · arXiv

EditYourself is a diffusion-based video editing model for talking heads, enabling transcript-driven lip-syncing, insertion, removal and retiming of speech while preserving identity and visual fidelity.

Mirage Studio by Captions AI
Mirage Studio
Captions AI · Senior Member of Technical Staff

AI-powered video creation studio enabling users to generate cinematic-quality videos from text prompts with advanced motion and style control.

Lipdub by Captions AI
Lipdub
Captions AI · Senior Member of Technical Staff

Video dubbing engine that translates and lip-syncs videos into 28+ languages. Led development of the core lip-synchronization pipeline.

AI Creator 3D Avatar by Captions AI
AI Creator (3D Avatar)
Captions AI · Senior Member of Technical Staff

The world's first 3D avatar designed for content creation. Generates photorealistic talking-head videos from text with natural lip-sync, head movement, and emotional expression in 30+ languages.

AI Twin by Captions AI
AI Twin
Captions AI · Senior Member of Technical Staff

Digital clone technology that creates a virtual version of the user from a short recording. Generates talking-head videos from text in 29 languages with AI voice cloning and natural expressions.

Ausynclab AI Voice Clone
Ausynclab AI
Ausynclab · Founder (side project)

AI-powered audio editing app — a side project I founded for fun. Built on voice cloning technology with best-in-class Vietnamese voice quality; reached 150k+ users within months of launch.

Dizim AI
Dizim AI
dizim · Co-Founder & Head of AI

Virtual presenter platform serving 200k+ users. Top 10 at Techfest 2022. Generates AI-driven talking head videos from text and slides.

Onlinica
Onlinica / OnliCV
Onlinica · Lead ML

First AI-powered online education platform in Vietnam. OnliCV connects your professional network. Led 8-person ML team building the core AI features.

DMSpro VisibilityPro
VisibilityPro
DMSpro · AI Engineer

Computer vision for retail automation. AI-powered shelf monitoring and product recognition that reduced manual review by 70%. Silver Winner SAP SME SEEDx 2020.

EyeQ Tech Face Detection
Face Detection & e-KYC
EyeQ Tech · AI Engineer

Facial recognition system for e-KYC and access control across Banking, Retail, and Security sectors. Real-time face detection and verification pipeline.

Bravesoft Vietnam OCR
Japanese OCR Engine
Bravesoft VN · AI Engineer

OCR engine specialized for complex Japanese Kanji/Kana character recognition. Built for document digitization workflows serving Japanese enterprise clients.

Open Source

My open-source work focuses on flow matching, audio/video generation, and lip synchronization. Based on my thesis research on talking face generation with GANs.

Articles

Interactive deep-dives into the concepts and architectures behind modern AI systems.

🎬
EditYourself — How It Works
Interactive Flow Matching Realtime Parallel Audio-to-Video Image-to-Video Text-to-Video

An interactive guide to EditYourself's audio-driven talking head pipeline — selective masking, audio conditioning, sliding window denoising, and multi-scale refinement.

Read article →
🎙️
Mirage — Seeing Voices
Interactive Flow Matching Audio-to-Video

How Mirage generates complete A-roll video from audio — asymmetric self-attention, learned RoPE, flow matching, and the data pipeline behind it.

Read article →
👄
LipDub — How It Works
Interactive Landmarks GAN Audio-to-Video Identity

An interactive guide to LipDub's two-stage talking face pipeline — a Landmark VAE and Diffusion Transformer with flow matching for audio-driven landmark generation, then photo-realistic rendering with SPADE alignment and AdaIN audio injection.

Read article →
🧑‍🎤
AI Creator — 3D Talking Avatar
Interactive NeRF 3D Avatar Audio-to-Video Blendshapes Relighting

How to build a 3D talking avatar from a single video — audio-visual lip sync, blendshape expressions, head pose optimization, hash-grid NeRF rendering, portrait restoration, and relighting from reference images.

Read article →
🔄
Transformer + RoPE: Full Pipeline
Interactive Transformer

Step-by-step walkthrough of the full Transformer pipeline with Rotary Position Embeddings — from raw text to output, with actual numbers and visualizations.

Read article →
📉
Optimizer Evolution — SGD to Muon
Interactive Optimization SGD Adam Muon

Why each optimizer was invented — from SGD's zig-zagging to Adam's adaptive moments to Muon's orthogonalization. Interactive training simulations show how each optimizer navigates a loss surface differently.

Read article →

Experience

💼 Work
  • Riverside - Research Scientist (2026-Present): Building a foundation model for the creator studio — relighting, talking-face synthesis, 3D scene synthesis, lip-sync, and dubbing unified into one video stack. Joined with the Pipio AI team as it transitioned to Riverside.
  • Pipio AI - Senior Research Engineer (2025-2026): Multi-modal video synthesis, visual dialog editing, lip-synchronization
  • Captions AI - Senior Member of Technical Staff (2022-2025): Lipdub development, contributed to Mirage
  • dizim - Head of AI / Co-Founder (2021-2022): Top 10 Techfest 2022, virtual presenter platform serving 200k+ users
  • Onlinica - Lead Machine Learning (2022-2023): Led 8-person team for AI-powered educational platform
  • Ausynclab - Founder (side project): AI audio editing app built on voice cloning technology with best local Vietnamese voice quality; reached 150k+ users within months of launch
  • DMSpro - AI Engineer (2019-2020): Computer vision for retail automation, reduced manual review by 70%
  • EyeQ Tech - AI Engineer (2019): Facial recognition for e-KYC and access control across Banking, Retail, and Security sectors
  • Bravesoft Vietnam - AI Engineer (2018): Japanese OCR engine for complex Kanji/Kana character recognition
🎓 Education
  • BSc in Mathematics and Computer Science - VNU-HCMUS University of Science (2015-2019)
  • Thesis: "Talking Face via Audio Driven" - Score: 9.8/10
🏆 Honors
  • Semi-Final Techfest 2021 - Top 20 Startups
  • Silver Winner - SAP SME SEEDx Development Challenge 2020
🛠️ Skills Large Scale Training, GANs, VAEs, Flow Matching, Diffusion, LLM, Kubernetes, Azure, PyTorch

Impact

Engineering contributions that directly supported fundraising, product launches, and business milestones.

2026 — Present
Riverside Riverside — Scaling Generative Video In Progress
Joined with the Pipio AI team as it transitioned to Riverside

Building a foundation model for the creator studio — one system spanning relighting, talking-face synthesis, 3D scene synthesis, lip-sync, and dubbing, so creators can do everything for video in one place.

Role: Research Scientist
2025 — 2026
Pipio AI Pipio AI — Building Next Round
$6.75M raised to date · 4,000+ companies · 100K+ videos

Joined as Senior Research Engineer, contributing to research on EditYourself and multi-modal video synthesis. Built core product demos and technical vision to support the next fundraising milestone before the team transitioned to Riverside.

Role: Senior Research Engineer
Jul 2024
Captions AI Captions AI — Series C
$60M raised · $500M valuation

Built core demos and technical presentations showcasing Lipdub and Mirage capabilities for the fundraising process. Led by Index Ventures, with Kleiner Perkins, a16z, and Sequoia returning.

Role: Senior Member of Technical Staff
Jun 2023
Captions AI Captions AI — Series B
$25M raised · $250M valuation

Developed product demos and pitch materials for the Lipdub video dubbing engine. Led by Kleiner Perkins, with a16z and Sequoia participating.

Role: Senior Member of Technical Staff
2022
VTC Academy VTC Academy / Onlinica — Fundraise
$20M raised

Led ML team building AI-powered e-learning features including virtual lecturers and voice cloning for the Onlinica platform, supporting the $20M fundraise for VTC Academy's digital education ecosystem.

Role: Lead Machine Learning
2022
dizim Dizim AI — Techfest 2022 & Pre-Seed
Top 10 Techfest · $110K from Antler

Co-founded and built the core AI engine, demos, and pitch video for Techfest Vietnam 2022 (Top 10 nationally). Secured pre-seed funding from Antler, a global early-stage VC.

Role: Co-Founder & Head of AI
2024
Ausynclab Ausynclab AI — Microsoft for Startups
Up to $150K in Azure credits

Founded Ausynclab as a side project — an AI audio editing app powered by voice cloning. Selected into the Microsoft for Startups program, reaching 150K+ users within months of launch.

Role: Founder (side project)