Dr. Shuai Wang is currently a Tenure-Track Associate Professor at the School of Intelligence Science and Technology, Nanjing University. He earned his B.E. degree from Northwestern Polytechnical University in 2014 under the supervision of Prof. Lei Xie, and his Ph.D. degree from Shanghai Jiao Tong University in 2020 under the supervision of Prof. Kai Yu and Prof. Yanmin Qian. Prior to joining Nanjing University, he served as a research scientist in Prof. Haizhou Li’s team at the Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen), where he still holds an adjunct position now. Additionally, he spent 2.5 years as a senior research scientist at Lightspeed & Quantum Studios, Tencent, where he led the speech group in R&D of speech technologies customized for games.
His research interest includes speaker modeling,target speaker processing, speech synthesis, voice conversion and music generation. He has published more than 60 papers at top-tier speech conferences/journals.
🔥 Openings
I will have several openings for graduate students (2027 Fall), will update details ASAP. I am currently looking for research assistants, please feel free to drop me an email with your CV if you are interested in the following topics:
- Speaker Modeling
- Target Speaker Processing
- Speech, Audio, and Music Understanding
- Speech, Audio, and Music Generation
- Brain-inspired speech processing
- End-to-End Speech Large Language Models
🎓 招生信息 / Recruitment Information
🔥 2027级硕士、博士研究生招生 / 2027 Fall Graduate Student Recruitment (Master & Ph.D.)
招生方向 / Research Areas:
- 说话人建模 / Speaker Modeling
- 目标说话人处理 / Target Speaker Processing
- 语音、音频、音乐理解 / Speech, Audio, and Music Understanding
- 语音、音频、音乐生成 / Speech, Audio, and Music Generation
- 类脑语音处理 / Brain-inspired Speech Processing
- 端到端语音大模型 / End-to-End Speech Large Language Models
申请要求 / Requirements:
- 计算机科学、电子工程或相关专业背景 / Background in Computer Science, Electronic Engineering, or related fields
- 对语音处理、机器学习有浓厚兴趣 / Strong interest in speech processing and machine learning
- 良好的编程能力(Python/C++) / Good programming skills (Python/C++)
- 英语读写能力良好 / Good English reading and writing skills
联系方式 / Contact:
- 邮箱 / Email: 点击显示邮箱 / Click to show email
- 请附上简历、成绩单和研究兴趣陈述 / Please include CV, transcripts, and research interest statement
南大智科学生特别说明 / Special Notice for NJU Students:
- 欢迎大二大三学生进组实习 / Welcome sophomore and junior students for internships
- 南大智科学生可到南雍楼西536办公室面聊 / NJU students can drop by Room 536 at Nanyong Building for face-to-face discussion
- 实习期间可参与实际科研项目 / Interns can participate in actual research projects
🔬 Research Assistant (RA) 招聘 / Research Assistant Recruitment:
- 常年招收 / Year-round Recruitment: 欢迎本科生、研究生申请Research Assistant职位 / Welcome undergraduate and graduate students to apply for Research Assistant positions
- 工作地点选择 / Location Options:
- 南京大学苏州校区 / Nanjing University @ Suzhou
- 深圳河套学院 / Shenzhen Loop Area Institute
- 香港中文大学(深圳) / Chinese University of Hong Kong (Shenzhen)
- 远程工作 / Remote work
- 发展机会 / Opportunities:
- 表现优秀者可推荐大厂实习 / Outstanding performers can be recommended for internships at leading tech companies
- 可推荐到知名高校深造 / Can be recommended for further studies at prestigious universities
- 参与前沿科研项目 / Participate in cutting-edge research projects
- 与Prof. Haizhou Li联合指导 / Jointly supervised with Prof. Haizhou Li
📚 Teaching
👨🎓 Students
Ph.D. students jointly supervised with Prof. Haizhou Li
- Chenyu Yang, CUHK-Shenzhen, Music Generation, Intern at Tencent AILab (犀牛鸟人才计划) and Microsoft Asia.
- Zhijun Liu, CUHK-Shenzhen, Speech Synthesis, Intern at NetEase and Bytedance (TopSeed)
- Sho Inoue, CUHK-Shenzhen, Speech Syntesis, Intern at NetEase and Meta FAIR.
- Qibing Bai, CUHK-Shenzhen, Accent Conversion, Intern at Tencent TEA-Lab
- Wenxuan Wu, CUHK, Target Speech Extraction
- Wupeng Wang, NUS, Speech Separation, (Graduated, now at Alibaba)
Past students
- Junjie Li, currently Ph.D. student at The Hong Kong Polytechnic University
📝 Publications
Please check my Google Scholar for the latest publications.
🪜 Open-Source Projects
- WeSpeaker: A comprehensive speaker embedding learning toolkit, supporting industrial-scale speaker embedding learning.
- WeSep: The first open-source target speaker extraction toolkit [Demo]
- DiffRhythm: Diffusion-based Rhythmic Music Generation, Fast, Fast, Fast!
- SongBloom: Autoregressive Diffusion-based Music Generation, High-Quality, High-Fidelity, High-Diversity!
- Real-T: A real-world, conversation-centric benchmark for Target Speaker Extraction (TSE)
- MSU-Bench: A multi-tier, multi-speaker, multi-lingual, multi-scenario, and multi-task benchmark for evaluating the large speech language models.
🎖 Honors and Awards
- 2024 Best Paper Award, ISCSLP 2024
- 2024 Best Student Paper Award, ISCSLP 2024
- 2019 VoxSRC 2019: Rank 1st in both 2 Tracks
- 2019 DIHARD 2019: Rank 1st in both 4 Tracks
- 2018 IEEE Ganesh N. Ramaswamy Memorial Student Grant Award
🌅 Services
I serve as a regular reviewer for multiple conferences and journals, including
- ICASSP, Interspeech, ASRU, SLT, T-ASLP, Computer Speech & Language, Speech Communication;
- ICML, Neurips, AAAI, ACM MM.
I serve as the Specical Session Chair of APSIPA 2025, the Operation Chair of ICASSP 2025 Suzhou Satellite Event, the Publication Chair of SLT 2024.
💬 Invited Talks
- 2024.09, Speaker Representation Learning: Theories, Applications and Practice at Brno University of Technology. [video]
- 2025.08, One Embedding Doesn’t Fit All Rethinking Speaker Modeling for Various Speech Applications, at MLC Workshop, Interspeech 2025. [slides]
- 2025.08, The Real-T Dataset, at Interspeech 2025. [slides]
- 2025.10, Deep Speaker Representation Learning, Tutorial, at NCMMSC 2025. [slides in Chinese]
- 2025.10, Deep Speaker Representation Learning, Tutorial, at APSIPA 2025. [slides in English]