Intelligent Speech Technology

Fall 2025

πŸ“š Course Details

Semester: Fall 2025
Credits: 2
Class Hours: 36 (18Γ—2)

πŸ‘¨β€πŸ« Instructor & Schedule

Instructor: Shuai Wang
Location: Nanyong Building, Room West 209
Time: Monday 2:00 - 3:50pm

πŸ“– Course Description

As the most natural form of human communication, intelligent speech technology has given machines the ability to "understand" and "speak". From the emergence of Siri voice assistants to the widespread adoption of smart home and in-car voice systems, and breakthrough developments in multimodal large models like GPT-4o, this technology is profoundly transforming our way of life.

This course will systematically explore the core principles and cutting-edge applications of intelligent speech technology. We will begin with human speech production mechanisms and auditory systems, then delve into traditional technologies including speech recognition, voiceprint modeling, speech synthesis, voice conversion and speech separation, while also examining new developments in speech technology in the era of large language models. The course follows a teaching approach that combines theory with practice, helping students master both theoretical foundations and practical application skills through hands-on projects.

🎯 Learning Objectives

By the end of this course, students will be able to:

⚑ Prerequisites

πŸ“Š Basic knowledge of linear algebra and calculus
🐍 Programming experience in Python
πŸ€– Familiarity with machine learning concepts
🧠 Basic knowledge of Deep Learning and tools (PyTorch, TensorFlow, etc.)

πŸ“… Course Schedule

Week Date Topic Materials Comments
1 2025.8.25 Course Introduction & Overview of Speech Technology -
2 2025.9.1 Overview of Speech Technology (continue) -
3 2025.9.8 Fundamentals of Speech Signal Processing -
4 2025.9.15 Introduction of Automatic Speech Recognition -
5 2025.9.22 Traditional ASR Models (GMM/DNN - HMM) Assignment 1 out
6 2025.9.29 End-to-End ASR Models -
7 2025.10.6 - - πŸŽ‰ National Holiday
8 2025.10.13 Speaker Modeling (Part 1) -
9 2025.10.20 Speaker Modeling (Part 2) -
10 2025.10.27 Speech Synthesis (Part 1) Invited talk by Jingbei Li, StepAudio
11 2025.11.3 Speech Synthesis (Part 2) -
12 2025.11.10 Speech Synthesis (Part 3) -
13 2025.11.17 Voice Conversion Assignment 2 out
14 2025.11.24 Speech Separation -
15 2025.12.1 Self-Supervised Learning for Speech -
16 2025.12.8 Speech Processing with Large Language Models -
17 2025.12.15 Applications of Speech Processing in Industry -
18 2025.12.22 Final Project Presentation - Last Class
20 2025.1.5 - - Final Project due

πŸ“š Course Materials

πŸ“– Recommended Textbooks

πŸ“„ Recommended Readings

Recent papers from top conferences (ICASSP, Interspeech, ACL, etc.)

πŸ’» Software & Tools

πŸ“Š Grading Policy

15%
Attendance
(From 4th week to 18th week)
20%
Homework Assignments
(2 assignments)
65%
Final Project
(1 project)

πŸ“ Assignments

Homework 1: TBD
Due: Week 8
Description: TBD
Homework 2: TBD
Due: Week 13
Description: TBD
Final Project: TBD
Due: Week 20
Description: TBD