Building with Deep Generative Models: Hands-on Science

GenAI Course Page

Dr. Alexander(Sasha) Apartsin

Multimedia Processing

Multimedia processing refers to the analysis, generation, and manipulation of multiple data modalities such as audio, images, video using computational techniques. It integrates signal processing, machine learning, and computer vision to enable understanding and interaction with complex multimedia content.

Deep Generative Models

Modern deep generative models, trained on large multimodal datasets, can synthesize realistic samples conditioned on chosen attributes. They can also create diverse, labeled datasets that enable training machine-learning systems when annotated data are scarce or unavailable.

Code-First Teaching

The course takes a code-first approach, pairing every concept with extensive, hands-on examples that utilize modern libraries, including PyTorch, librosa, and HuggingFace transformers. The course equips students with a practical toolbox of ideas and tools for rapidly building GenAI-based models and applications.

Guided Student Projects

The course syllabus is designed to enable students to begin their projects while learning the material. As the course continues, they will enrich their projects with the concepts they acquire. Each team will give several in-class presentations for discussion and feedback.

Innovation Through Tools Mastery

As standard tasks are increasingly handled by AI and mature libraries, expectations of professional developers shift toward innovation and rapid integration. Accordingly, a key requirement for student projects is to tackle new use cases by generating unique data and training or fine-tuning a task-specific audio or image processing model.

Modular Course Syllabus

The list below presents the complete set of subjects; individual course instances may vary depending on the course format, students’ backgrounds, and class dynamics.

Syllabus: Building with GenAI

Other Courses in the Hands-on Science(HoS) Series

Report abuse