Dr. Shuai Wang is currently a Tenure-Track Associate Professor at the School of Intelligence Science and Technology, Nanjing University. He earned his B.E. degree from Northwestern Polytechnical University in 2014 under the supervision of Prof. Lei Xie, and his Ph.D. degree from Shanghai Jiao Tong University in 2020 under the supervision of Prof. Kai Yu and Prof. Yanmin Qian. Prior to joining Nanjing University, he served as a research scientist in Prof. Haizhou Li’s team at the Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen), where he still holds an adjunct position now. Additionally, he spent 2.5 years as a senior research scientist at Lightspeed & Quantum Studios, Tencent, where he led the speech group in R&D of speech technologies customized for games.

His research interest includes speaker modeling,target speaker processing, speech synthesis, voice conversion and music generation. He has published more than 60 papers at top-tier speech conferences/journals.

🔥 Openings

I will have several openings for graduate students (2026 Fall), will update details ASAP. I am currently looking for research assistants, please feel free to drop me an email with your CV if you are interested in the following topics:

Speaker Modeling
Target Speaker Processing
Speech Generation
Music Generation
Brain-inspired speech processing

Note that research assistants can choose to work either in Nanjing University @ Suzhou or in Chinese University of Hong Kong (Shenzhen), jointly supervised with Prof. Haizhou Li.

🎓 招生信息 / Recruitment Information

🎵 基于我们论文 SongBloom 生成的招生欢迎演示 / Welcome demo generated using our SongBloom paper

🔥 2026年秋季研究生招生 / 2026 Fall Graduate Student Recruitment

招生方向 / Research Areas:

说话人建模 / Speaker Modeling
目标说话人处理 / Target Speaker Processing
语音生成 / Speech Generation
音乐生成 / Music Generation
类脑语音处理 / Brain-inspired Speech Processing

申请要求 / Requirements:

计算机科学、电子工程或相关专业背景 / Background in Computer Science, Electronic Engineering, or related fields
对语音处理、机器学习有浓厚兴趣 / Strong interest in speech processing and machine learning
良好的编程能力（Python/C++） / Good programming skills (Python/C++)
英语读写能力良好 / Good English reading and writing skills

联系方式 / Contact:

邮箱 / Email: 点击显示邮箱 / Click to show email
请附上简历、成绩单和研究兴趣陈述 / Please include CV, transcripts, and research interest statement

南大智科学生特别说明 / Special Notice for NJU Students:

欢迎大二大三学生进组实习 / Welcome sophomore and junior students for internships
南大智科学生可到南雍楼西536办公室面聊 / NJU students can drop by Room 536 at Nanyong Building for face-to-face discussion
实习期间可参与实际科研项目 / Interns can participate in actual research projects

📚 Teaching

Intelligent Speech Technology, 2025 Fall

👨‍🎓 Students

Ph.D. students jointly supervised with Prof. Haizhou Li

Chenyu Yang, CUHK-Shenzhen, Music Generation, Intern at Tencent AILab （犀牛鸟人才计划）.
Zhijun Liu, CUHK-Shenzhen, Speech Synthesis, Intern at NetEase and Bytedance (TopSeed)
Sho Inoue, CUHK-Shenzhen, Speech Syntesis, Intern at NetEase and Meta FAIR.
Qibing Bai, CUHK-Shenzhen, Accent Conversion, Intern at Tencent TEA-Lab
Wenxuan Wu, CUHK, Target Speech Extraction
Wupeng Wang, NUS, Speech Separation

Past students

Junjie Li, currently Ph.D. student at The Hong Kong Polytechnic University

📝 Publications

Please check my Google Scholar for the latest publications.

🪜 Open-Source Projects

WeSpeaker: A comprehensive speaker embedding learning toolkit, supporting industrial-scale speaker embedding learning.
WeSep: The first open-source target speaker extraction toolkit [Demo]
DiffRhythm: Diffusion-based Rhythmic Music Generation, Fast, Fast, Fast!
SongBloom: Autoregressive Diffusion-based Music Generation, High-Quality, High-Fidelity, High-Diversity!
Real-T: A real-world, conversation-centric benchmark for Target Speaker Extraction (TSE)
MSU-Bench: A multi-tier, multi-speaker, multi-lingual, multi-scenario, and multi-task benchmark for evaluating the large speech language models.

🎖 Honors and Awards

2024 Best Paper Award, ISCSLP 2024
2024 Best Student Paper Award, ISCSLP 2024
2019 VoxSRC 2019: Rank 1st in both 2 Tracks
2019 DIHARD 2019: Rank 1st in both 4 Tracks
2018 IEEE Ganesh N. Ramaswamy Memorial Student Grant Award

🌅 Services

I serve as a regular reviewer for multiple conferences and journals, including

ICASSP, Interspeech, ASRU, SLT, T-ASLP, Computer Speech & Language, Speech Communication;
ICML, Neurips, AAAI, ACM MM.

I serve as the Specical Session Chair of APSIPA 2025, the Operation Chair of ICASSP 2025 Suzhou Satellite Event, the Publication Chair of SLT 2024.

💬 Invited Talks

2024.09, Speaker Representation Learning: Theories, Applications and Practice at Brno University of Technology. [video]
2025.08, One Embedding Doesn’t Fit All Rethinking Speaker Modeling for Various Speech Applications, at MLC Workshop, Interspeech 2025. [slides]
2025.08, The Real-T Dataset, at Interspeech 2025. [slides]
2025.10, Deep Speaker Representation Learning, Tutorial, at NCMMSC 2025. [slides in Chinese]
2025.10, Deep Speaker Representation Learning, Tutorial, at APSIPA 2025. [slides in English]