Intelligent Speech Technology

Fall 2025

Course Information

Semester: Fall 2025
Credits: 2
Class Hours: 36 (18×2)
Instructor: Shuai Wang
Location: Nanyong Building, Room West 209
Time: Monday 2:00 - 3:50pm

Course Description

As the most natural form of human communication, intelligent speech technology has given machines the ability to “understand” and “speak”. From the emergence of Siri voice assistants to the widespread adoption of smart home and in-car voice systems, and breakthrough developments in multimodal large models like GPT-4o, this technology is profoundly transforming our way of life.

This course will systematically explore the core principles and cutting-edge applications of intelligent speech technology. We will begin with human speech production mechanisms and auditory systems, then delve into traditional technologies including speech recognition, voiceprint modeling, speech synthesis, voice conversion and speech separation, while also examining new developments in speech technology in the era of large language models. The course follows a teaching approach that combines theory with practice, helping students master both theoretical foundations and practical application skills through hands-on projects.

Learning Objectives

By the end of this course, students will be able to:

  • Understand the fundamental principles of speech signal processing
  • Master key technologies in speech recognition, synthesis, and enhancement
  • Apply machine learning techniques to speech processing tasks
  • Implement practical speech processing applications
  • Critically evaluate research papers in the field
  • Understand the latest developments in speech technology

Prerequisites

  • Basic knowledge of linear algebra and calculus
  • Programming experience in Python
  • Familiarity with machine learning concepts
  • Basic knowledge of Deep Learning and tools (PyTorch, TensorFlow, etc.)

Course Schedule

Week Date Topic Materials Comments
1 2025.8.25 Course Introduction & Overview of Speech Technology Slides Demos -
2 2025.9.1 Overview of Speech Technology (continue) Slides -
3 2025.9.8 Fundamentals of Speech Signal Processing Slides -
4 2025.9.15 Introduction of Automatic Speech Recognition Slides WER Demo -
5 2025.9.22 Traditional ASR Models (GMM/DNN - HMM) Slides Assignment 1 out
6 2025.9.29 End-to-End ASR Models Slides -
7 2025.10.6 - - National Holidy
8 2025.10.13 Speaker Modeling (Part 1) Slides -
9 2025.10.20 Speaker Modeling (Part 2) Slides -
10 2025.10.27 Speech Synthesis (Part 1) Slides Invited talk by Jingbei Li, StepAudio
11 2025.11.3 Speech Synthesis (Part 2) Slides -
12 2025.11.10 Speech Synthesis (Part 3) Slides -
13 2025.11.17 Voice Conversion Slides Assignment 2 out
14 2025.11.24 Speech Separation Slides -
15 2025.12.1 Self-Supervised Learning for Speech Slides -
16 2025.12.8 Speech Processing with Large Language Models Slides -
17 2025.12.15 Applications of Speech Processing in Industry Slides Invited Talks
18 2025.12.22 Final Project Presentation   Last Class
20 2025.1.5   - Final Project due

Course Materials

  • [1] Jurafsky D, Martin J H. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models. 2025 Edition.
  • [2] https://speechprocessingbook.aalto.fi/index.html
  • [3] Xuedong Huang, Alex Aceoro, Hsiao-Wuen Hon, Spoken Language Processing: A guide to theory, algorithm, and system development, Prentice Hall, 2011
  • [4] 韩纪庆、张磊、郑铁然,《语音信号处理》,清华大学出版社
  • [5] 洪青阳,李琳著,《语音识别:原理与应用》,电子工业出版社

Recent papers from top conferences (ICASSP, Interspeech, ACL, etc.)

Software & Tools

  • Python 3.8+ - Primary programming language
  • PyTorch/TensorFlow - Deep learning frameworks
  • Librosa - Audio processing library
  • WeNet/WeSpeaker/WeSep - Open-source speech processing toolkits
  • ESPNet/SpeechBrain/Kaldi - Additional speech processing frameworks

Grading Policy

  • Attendance: 15% (From 4th week to 18th week)
  • Homework Assignments: 20% (2 assignments)
  • Final Project: 65% (1 project)

Assignments

Homework 1: TBD

Due: Week 8
Description: TBD

Homework 2: TBD

Due: Week 13
Description: TBD

Final Project: TBD

Due: Week 20
Description: TBD


*This syllabus is subject to change. Students will be notified of any modifications.*