Nanjing University · School of Intelligence Science and Technology Instructor Homepage

Intelligent Speech Technology

This course introduces core theory and modern practice in speech technology, from signal processing and ASR to speaker modeling, speech synthesis, and large-model-based speech applications.

Fall 2025 2 Credits 18 Weeks · 36 Contact Hours Mon 2:00 - 3:50 PM

Course Overview

Instructor
Location
Nanyong Building, Room West 209
Meeting Time
Monday, 2:00 - 3:50 PM
Teaching Format
Lectures + demos + project-oriented practice

As one of the most natural forms of human communication, speech is central to the next generation of AI systems. This course provides a structured path from fundamentals to state-of-the-art, covering both classical and neural approaches. We discuss practical design choices used in modern systems and how to critically read and implement research ideas.

Learning Objectives

Prerequisites

Weekly Schedule

Week Date Topic Materials Notes
1 2025.8.25 Course Introduction & Overview of Speech Technology Slides Demos -
2 2025.9.1 Overview of Speech Technology (continued) Slides -
3 2025.9.8 Fundamentals of Speech Signal Processing Slides -
4 2025.9.15 Introduction to Automatic Speech Recognition Slides WER Demo -
5 2025.9.22 Traditional ASR Models (GMM/DNN-HMM) Slides Assignment 1 Released
6 2025.9.29 End-to-End ASR Models Slides -
7 2025.10.6 National Holiday - No class meeting
8 2025.10.13 Speaker Modeling (Part 1) Slides -
9 2025.10.20 Speaker Modeling (Part 2) Slides -
10 2025.10.27 Speech Synthesis (Part 1) Slides (to be posted) Guest Talk: Jingbei Li (StepAudio)
11 2025.11.3 Speech Synthesis (Part 2) Slides (to be posted) -
12 2025.11.10 Speech Synthesis (Part 3) Slides (to be posted) -
13 2025.11.17 Voice Conversion Slides (to be posted) Assignment 2 Released
14 2025.11.24 Speech Separation Slides (to be posted) -
15 2025.12.1 Self-Supervised Learning for Speech Slides (to be posted) -
16 2025.12.8 Speech Processing with Large Language Models Slides (to be posted) -
17 2025.12.15 Applications of Speech Processing in Industry Slides (to be posted) Invited Talks
18 2025.12.22 Final Project Presentation - Last class
20 2026.1.5 - - Final Project Due

Course Materials

Recommended Textbooks

Recommended Reading

Recent papers from top conferences and journals: ICASSP, Interspeech, ACL, T-ASLP, and related venues.

Software & Tooling

Assessment

15%
Attendance
Tracked from Week 4 to Week 18
20%
Homework
Two assignments
65%
Final Project
One term project and presentation

Assignments & Deliverables

Homework 1

Release: Week 5  |  Due: Week 8

Details and submission format will be announced in class and on the course page.

Homework 2

Release: Week 13  |  Due: To be announced

This assignment focuses on advanced topics covered in the second half of the course.

Final Project

Presentation: Week 18  |  Final Submission: Week 20 (2026.1.5)

Students are expected to propose, implement, and evaluate a speech-related project with a clear technical report.

Course Policies

The syllabus and schedule are subject to refinement as the semester progresses. Any updates to lectures, deadlines, or materials will be announced in advance.