Dr. Shuai Wang is currently a Tenure-Track Associate Professor at the School of Intelligence Science and Technology, Nanjing University. He earned his B.E. degree from Northwestern Polytechnical University in 2014 under the supervision of Prof. Lei Xie, and his Ph.D. degree from Shanghai Jiao Tong University in 2020 under the supervision of Prof. Kai Yu and Prof. Yanmin Qian. Prior to joining Nanjing University, he served as a research scientist in Prof. Haizhou Li’s team at the Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen), where he still holds an adjunct position now. Additionally, he spent 2.5 years as a senior research scientist at Lightspeed & Quantum Studios, Tencent, where he led the speech group in R&D of speech technologies customized for games.

His research interest includes speaker modeling,target speaker processing, speech synthesis, voice conversion and music generation. He has published more than 60 papers at top-tier speech conferences/journals.

🔥 Openings

I will have several openings for graduate students (2027 Fall), will update details ASAP. I am currently looking for research assistants, please feel free to drop me an email with your CV if you are interested in the following topics:

Speaker Modeling
Target Speaker Processing
Speech, Audio, and Music Understanding
Speech, Audio, and Music Generation
Brain-inspired speech processing
End-to-End Speech Large Language Models

🎓 招生信息 / Recruitment Information

🎵 基于我们论文 SongBloom 生成的招生欢迎演示 / Welcome demo generated using our SongBloom paper

🔥 2027级硕士、博士研究生招生 / 2027 Fall Graduate Student Recruitment (Master & Ph.D.)

招生方向 / Research Areas:

说话人建模 / Speaker Modeling
目标说话人处理 / Target Speaker Processing
语音、音频、音乐理解 / Speech, Audio, and Music Understanding
语音、音频、音乐生成 / Speech, Audio, and Music Generation
类脑语音处理 / Brain-inspired Speech Processing
端到端语音大模型 / End-to-End Speech Large Language Models

申请要求 / Requirements:

计算机科学、电子工程或相关专业背景 / Background in Computer Science, Electronic Engineering, or related fields
对语音处理、机器学习有浓厚兴趣 / Strong interest in speech processing and machine learning
良好的编程能力（Python/C++） / Good programming skills (Python/C++)
英语读写能力良好 / Good English reading and writing skills

联系方式 / Contact:

邮箱 / Email: 点击显示邮箱 / Click to show email
请附上简历、成绩单和研究兴趣陈述 / Please include CV, transcripts, and research interest statement

南大智科学生特别说明 / Special Notice for NJU Students:

欢迎大二大三学生进组实习 / Welcome sophomore and junior students for internships
南大智科学生可到南雍楼西536办公室面聊 / NJU students can drop by Room 536 at Nanyong Building for face-to-face discussion
实习期间可参与实际科研项目 / Interns can participate in actual research projects

🔬 Research Assistant (RA) 招聘 / Research Assistant Recruitment:

常年招收 / Year-round Recruitment: 欢迎本科生、研究生申请Research Assistant职位 / Welcome undergraduate and graduate students to apply for Research Assistant positions
工作地点选择 / Location Options:
- 南京大学苏州校区 / Nanjing University @ Suzhou
- 深圳河套学院 / Shenzhen Loop Area Institute
- 香港中文大学（深圳） / Chinese University of Hong Kong (Shenzhen)
- 远程工作 / Remote work
发展机会 / Opportunities:
- 表现优秀者可推荐大厂实习 / Outstanding performers can be recommended for internships at leading tech companies
- 可推荐到知名高校深造 / Can be recommended for further studies at prestigious universities
- 参与前沿科研项目 / Participate in cutting-edge research projects
- 与Prof. Haizhou Li联合指导 / Jointly supervised with Prof. Haizhou Li

📚 Teaching

Intelligent Speech Technology, 2025 Fall

👨‍🎓 Students

Ph.D. students jointly supervised with Prof. Haizhou Li

Chenyu Yang, CUHK-Shenzhen, Music Generation, Intern at Tencent AILab （犀牛鸟人才计划） and Microsoft Asia.
Zhijun Liu, CUHK-Shenzhen, Speech Synthesis, Intern at NetEase and Bytedance (TopSeed)
Sho Inoue, CUHK-Shenzhen, Speech Syntesis, Intern at NetEase and Meta FAIR.
Qibing Bai, CUHK-Shenzhen, Accent Conversion, Intern at Tencent TEA-Lab
Wenxuan Wu, CUHK, Target Speech Extraction
Wupeng Wang, NUS, Speech Separation, (Graduated, now at Alibaba)

Past students

Junjie Li, currently Ph.D. student at The Hong Kong Polytechnic University

📝 Publications

Please check my Google Scholar for the latest publications.

🪜 Open-Source Projects

WeSpeaker: A comprehensive speaker embedding learning toolkit, supporting industrial-scale speaker embedding learning.
WeSep: The first open-source target speaker extraction toolkit [Demo]
DiffRhythm: Diffusion-based Rhythmic Music Generation, Fast, Fast, Fast!
SongBloom: Autoregressive Diffusion-based Music Generation, High-Quality, High-Fidelity, High-Diversity!
Real-T: A real-world, conversation-centric benchmark for Target Speaker Extraction (TSE)
MSU-Bench: A multi-tier, multi-speaker, multi-lingual, multi-scenario, and multi-task benchmark for evaluating the large speech language models.

🎖 Honors and Awards

2024 Best Paper Award, ISCSLP 2024
2024 Best Student Paper Award, ISCSLP 2024
2019 VoxSRC 2019: Rank 1st in both 2 Tracks
2019 DIHARD 2019: Rank 1st in both 4 Tracks
2018 IEEE Ganesh N. Ramaswamy Memorial Student Grant Award

🌅 Services

I serve as a regular reviewer for multiple conferences and journals, including

ICASSP, Interspeech, ASRU, SLT, T-ASLP, Computer Speech & Language, Speech Communication;
ICML, Neurips, AAAI, ACM MM.

I serve as the Specical Session Chair of APSIPA 2025, the Operation Chair of ICASSP 2025 Suzhou Satellite Event, the Publication Chair of SLT 2024.

💬 Invited Talks

2024.09, Speaker Representation Learning: Theories, Applications and Practice at Brno University of Technology. [video]
2025.08, One Embedding Doesn’t Fit All Rethinking Speaker Modeling for Various Speech Applications, at MLC Workshop, Interspeech 2025. [slides]
2025.08, The Real-T Dataset, at Interspeech 2025. [slides]
2025.10, Deep Speaker Representation Learning, Tutorial, at NCMMSC 2025. [slides in Chinese]
2025.10, Deep Speaker Representation Learning, Tutorial, at APSIPA 2025. [slides in English]