Data-intensive systems are designed to store, process, and analyze large volumes of data that cannot be handled by a single computer, typically using distributed architectures.
Big data algorithms are designed to process massive datasets that cannot fit into the memory or storage of a single computer. They rely on distributed computing models such as MapReduce and its successors, which decompose tasks such as sorting, aggregation, joins, and matrix multiplication into parallel subtasks executed across multiple machines.
The course takes a code-first approach, pairing each algorithmic concept with hands-on implementations using modern big data libraries and platforms, including PySpark, Spark SQL, and stream-processing frameworks. It equips students with a practical toolbox for designing and implementing scalable big data algorithms that solve problems such as large-scale sorting, aggregation, and matrix multiplication on datasets that cannot fit within the memory of a single machine.
The course syllabus is designed to enable students to begin their projects while learning the material. As the course continues, they will enrich their projects with the concepts they acquire. Each team will give several in-class presentations for discussion and feedback.
As everyday tasks are increasingly automated by AI and mature libraries, professional developers are expected to innovate and integrate solutions quickly. Reflecting this shift, course projects emphasize exploring new use cases by thoughtfully combining multiple learned components in fresh and original ways
The list below presents the complete set of subjects; individual course instances may vary depending on the course format, students’ backgrounds, and class dynamics.
Browse course offerings