Intelligent Speech Technology

Fall 2025

Course Information

Semester: Fall 2025
Credits: 2
Class Hours: 36 (18×2)
Instructor: Shuai Wang
Location: Nanyong Building, Room West 209
Time: Monday 2:00 - 3:50pm

Course Description

As the most natural form of human communication, intelligent speech technology has given machines the ability to “understand” and “speak”. From the emergence of Siri voice assistants to the widespread adoption of smart home and in-car voice systems, and breakthrough developments in multimodal large models like GPT-4o, this technology is profoundly transforming our way of life.

This course will systematically explore the core principles and cutting-edge applications of intelligent speech technology. We will begin with human speech production mechanisms and auditory systems, then delve into traditional technologies including speech recognition, voiceprint modeling, speech synthesis, voice conversion and speech separation, while also examining new developments in speech technology in the era of large language models. The course follows a teaching approach that combines theory with practice, helping students master both theoretical foundations and practical application skills through hands-on projects.

Learning Objectives

By the end of this course, students will be able to:

Understand the fundamental principles of speech signal processing
Master key technologies in speech recognition, synthesis, and enhancement
Apply machine learning techniques to speech processing tasks
Implement practical speech processing applications
Critically evaluate research papers in the field
Understand the latest developments in speech technology

Prerequisites

Basic knowledge of linear algebra and calculus
Programming experience in Python
Familiarity with machine learning concepts
Basic knowledge of Deep Learning and tools (PyTorch, TensorFlow, etc.)

Course Schedule

Week	Date	Topic	Materials	Comments
1	2025.8.25	Course Introduction & Overview of Speech Technology	Slides Demos	-
2	2025.9.1	Overview of Speech Technology (continue)	Slides	-
3	2025.9.8	Fundamentals of Speech Signal Processing	Slides	-
4	2025.9.15	Introduction of Automatic Speech Recognition	Slides WER Demo	-
5	2025.9.22	Traditional ASR Models (GMM/DNN - HMM)	Slides	Assignment 1 out
6	2025.9.29	End-to-End ASR Models	Slides	-
7	2025.10.6	-	-	National Holidy
8	2025.10.13	Speaker Modeling (Part 1)	Slides	-
9	2025.10.20	Speaker Modeling (Part 2)	Slides	-
10	2025.10.27	Speech Synthesis (Part 1)	Slides	Invited talk by Jingbei Li, StepAudio
11	2025.11.3	Speech Synthesis (Part 2)	Slides	-
12	2025.11.10	Speech Synthesis (Part 3)	Slides	-
13	2025.11.17	Voice Conversion	Slides	Assignment 2 out
14	2025.11.24	Speech Separation	Slides	-
15	2025.12.1	Self-Supervised Learning for Speech	Slides	-
16	2025.12.8	Speech Processing with Large Language Models	Slides	-
17	2025.12.15	Applications of Speech Processing in Industry	Slides	Invited Talks
18	2025.12.22	Final Project Presentation		Last Class
20	2025.1.5		-	Final Project due

Course Materials

[1] Jurafsky D, Martin J H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models. 2025 Edition.
[2] https://speechprocessingbook.aalto.fi/index.html
[3] Xuedong Huang, Alex Aceoro, Hsiao-Wuen Hon, Spoken Language Processing: A guide to theory, algorithm, and system development, Prentice Hall, 2011
[4] 韩纪庆、张磊、郑铁然，《语音信号处理》，清华大学出版社
[5] 洪青阳，李琳著，《语音识别：原理与应用》，电子工业出版社

Software & Tools

Python 3.8+ - Primary programming language
PyTorch/TensorFlow - Deep learning frameworks
Librosa - Audio processing library
WeNet/WeSpeaker/WeSep - Open-source speech processing toolkits
ESPNet/SpeechBrain/Kaldi - Additional speech processing frameworks

Grading Policy

Attendance: 15% (From 4th week to 18th week)
Homework Assignments: 20% (2 assignments)
Final Project: 65% (1 project)

Assignments

Homework 1: TBD

Due: Week 8
Description: TBD

Homework 2: TBD

Due: Week 13
Description: TBD

Final Project: TBD

Due: Week 20
Description: TBD

*This syllabus is subject to change. Students will be notified of any modifications.*