Building with Embodied Vision Models: Hands-on Science
EmbVision Course Page
Dr. Alexander(Sasha) Apartsin
Dr. Alexander(Sasha) Apartsin
Image processing and computer vision address tasks from low-level enhancement, filtering, and feature extraction to high-level object recognition and scene understanding. Modern systems combine traditional techniques with deep learning trained on large image datasets for robust performance.
Embodied AI refers to artificial intelligence systems that both perceive and act within an environment, combining visual understanding with physical or simulated embodiment. Such models integrate perception, reasoning, and motor control to enable agents to learn from and interact meaningfully with their surroundings.
The course takes a code-first approach, pairing every concept with extensive, hands-on examples that utilize modern libraries, including PyTorch, torchvision, OpenCV, and timm libraries. The course equips students with a practical toolbox of ideas and tools for rapidly building vision-based models and applications.
The course syllabus is designed to enable students to begin their projects while learning the material. As the course continues, they will enrich their projects with the concepts they acquire. Each team will give several in-class presentations for discussion and feedback.
As standard tasks are increasingly handled by AI and mature libraries, expectations of professional developers shift toward innovation and rapid integration. Accordingly, a key requirement for student projects is to tackle new use cases by generating unique data and training or fine-tuning task-specific vision models.
The list below presents the complete set of subjects; individual course instances may vary depending on the course format, students’ backgrounds, and class dynamics.