Fall 2025
As the most natural form of human communication, intelligent speech technology has given machines the ability to "understand" and "speak". From the emergence of Siri voice assistants to the widespread adoption of smart home and in-car voice systems, and breakthrough developments in multimodal large models like GPT-4o, this technology is profoundly transforming our way of life.
This course will systematically explore the core principles and cutting-edge applications of intelligent speech technology. We will begin with human speech production mechanisms and auditory systems, then delve into traditional technologies including speech recognition, voiceprint modeling, speech synthesis, voice conversion and speech separation, while also examining new developments in speech technology in the era of large language models. The course follows a teaching approach that combines theory with practice, helping students master both theoretical foundations and practical application skills through hands-on projects.
By the end of this course, students will be able to:
| Week | Date | Topic | Materials | Comments |
|---|---|---|---|---|
| 1 | 2025.8.25 | Course Introduction & Overview of Speech Technology | - | |
| 2 | 2025.9.1 | Overview of Speech Technology (continue) | - | |
| 3 | 2025.9.8 | Fundamentals of Speech Signal Processing | - | |
| 4 | 2025.9.15 | Introduction of Automatic Speech Recognition | - | |
| 5 | 2025.9.22 | Traditional ASR Models (GMM/DNN - HMM) | Assignment 1 out | |
| 6 | 2025.9.29 | End-to-End ASR Models | - | |
| 7 | 2025.10.6 | - | - | π National Holiday |
| 8 | 2025.10.13 | Speaker Modeling (Part 1) | - | |
| 9 | 2025.10.20 | Speaker Modeling (Part 2) | - | |
| 10 | 2025.10.27 | Speech Synthesis (Part 1) |
π Slides
|
Invited talk by Jingbei Li, StepAudio |
| 11 | 2025.11.3 | Speech Synthesis (Part 2) |
π Slides
|
- |
| 12 | 2025.11.10 | Speech Synthesis (Part 3) |
π Slides
|
- |
| 13 | 2025.11.17 | Voice Conversion |
π Slides
|
Assignment 2 out |
| 14 | 2025.11.24 | Speech Separation |
π Slides
|
- |
| 15 | 2025.12.1 | Self-Supervised Learning for Speech |
π Slides
|
- |
| 16 | 2025.12.8 | Speech Processing with Large Language Models |
π Slides
|
- |
| 17 | 2025.12.15 | Applications of Speech Processing in Industry |
π Slides
π€ Invited Talks
|
- |
| 18 | 2025.12.22 | Final Project Presentation | - | Last Class |
| 20 | 2025.1.5 | - | - | Final Project due |
Recent papers from top conferences (ICASSP, Interspeech, ACL, etc.)