Intelligent Speech Technology

📖 Course Description

As the most natural form of human communication, intelligent speech technology has given machines the ability to "understand" and "speak". From the emergence of Siri voice assistants to the widespread adoption of smart home and in-car voice systems, and breakthrough developments in multimodal large models like GPT-4o, this technology is profoundly transforming our way of life.

This course will systematically explore the core principles and cutting-edge applications of intelligent speech technology. We will begin with human speech production mechanisms and auditory systems, then delve into traditional technologies including speech recognition, voiceprint modeling, speech synthesis, voice conversion and speech separation, while also examining new developments in speech technology in the era of large language models. The course follows a teaching approach that combines theory with practice, helping students master both theoretical foundations and practical application skills through hands-on projects.

🎯 Learning Objectives

By the end of this course, students will be able to:

Understand the fundamental principles of speech signal processing
Master key technologies in speech recognition, synthesis, and enhancement
Apply machine learning techniques to speech processing tasks
Implement practical speech processing applications
Critically evaluate research papers in the field
Understand the latest developments in speech technology

⚡ Prerequisites

📊 Basic knowledge of linear algebra and calculus

🐍 Programming experience in Python

🤖 Familiarity with machine learning concepts

🧠 Basic knowledge of Deep Learning and tools (PyTorch, TensorFlow, etc.)

📅 Course Schedule

Week	Date	Topic	Materials	Comments
1	2025.8.25	Course Introduction & Overview of Speech Technology	📄 Slides 🎯 Demos	-
2	2025.9.1	Overview of Speech Technology (continue)	📄 Slides	-
3	2025.9.8	Fundamentals of Speech Signal Processing	📄 Slides	-
4	2025.9.15	Introduction of Automatic Speech Recognition	📄 Slides 🎯 WER Demo	-
5	2025.9.22	Traditional ASR Models (GMM/DNN - HMM)	📄 Slides	Assignment 1 out
6	2025.9.29	End-to-End ASR Models	📄 Slides	-
7	2025.10.6	-	-	🎉 National Holiday
8	2025.10.13	Speaker Modeling (Part 1)	📄 Slides	-
9	2025.10.20	Speaker Modeling (Part 2)	📄 Slides	-
10	2025.10.27	Speech Synthesis (Part 1)	📄 Slides	🎤 Talk: Jingbei Li (StepAudio)
11	2025.11.3	Speech Synthesis (Part 2)	📄 Slides	-
12	2025.11.10	Speech Synthesis (Part 3)	📄 Slides	-
13	2025.11.17	Voice Conversion	📄 Slides	Assignment 2 out
14	2025.11.24	Speech Separation	📄 Slides	-
15	2025.12.1	Self-Supervised Learning for Speech	📄 Slides	-
16	2025.12.8	Speech Processing with Large Language Models	📄 Slides	-
17	2025.12.15	Applications of Speech Processing in Industry	📄 Slides	🎤 Invited Talks
18	2025.12.22	Final Project Presentation	-	Last Class
20	2025.1.5	-	-	Final Project due

📚 Course Materials

📖 Recommended Textbooks

[1] Jurafsky D, Martin J H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models. 2025 Edition.
[2] https://speechprocessingbook.aalto.fi/index.html
[3] Xuedong Huang, Alex Aceoro, Hsiao-Wuen Hon, Spoken Language Processing: A guide to theory, algorithm, and system development, Prentice Hall, 2011
[4] 韩纪庆、张磊、郑铁然，《语音信号处理》，清华大学出版社
[5] 洪青阳，李琳著，《语音识别：原理与应用》，电子工业出版社

📄 Recommended Readings

Recent papers from top conferences (ICASSP, Interspeech, ACL, etc.)

💻 Software & Tools

Python 3.8+ - Primary programming language
PyTorch/TensorFlow - Deep learning frameworks
Librosa - Audio processing library
WeNet/WeSpeaker/WeSep - Open-source speech processing toolkits
ESPNet/SpeechBrain/Kaldi - Additional speech processing frameworks

📊 Grading Policy

15%

Attendance

(From 4th week to 18th week)

20%

Homework Assignments

(2 assignments)

65%

Final Project

(1 project)

📝 Assignments

Homework 1: TBD

Due: Week 8

Description: TBD

Homework 2: TBD

Due: Week 13

Description: TBD

Final Project: TBD

Due: Week 20

Description: TBD

Intelligent Speech Technology

📚 Course Details

👨‍🏫 Instructor & Schedule

📖 Course Description

🎯 Learning Objectives

⚡ Prerequisites

📅 Course Schedule

📚 Course Materials

📖 Recommended Textbooks

📄 Recommended Readings

💻 Software & Tools

📊 Grading Policy

📝 Assignments