2021-2022 Sem II

This course introduces cloud infrastructure. Students should feel more comfortable with building and deploying their cloud services after having done this course.

Course Information

  • Prerequisites: COL331 or equivalent.

    Note: The course includes programming assignments and thus expects proficiency with systems programming and debugging.

  • Credits: 3-0-2
  • Slot: AB, Mondays and Thursdays 3:30-4:45pm in MS Teams.
  • TAs:
    • Nutesh Sahu: jcs212242 AT csia.iitd.ac.in
    • Soumen Basu: soumen.basu AT cse.iitd.ac.in
    • Abhisek Panda: csz202445 AT cse.iitd.ac.in
  • TA Office hours: TBD
  • Reading material: There is no textbook for the course. Most lectures will link to more reading material.

Grading criteria

  • 30% labs (programming assignments)
  • 20% project
  • 10% assignments
  • 20% minor exam
  • 20% major exam

Supporting systems

  • Lectures will be held in the course Teams channel.
  • Assignments will be regularly released on gradescope.
  • Labs are to be done on Baadal. You will need VPN access to IITD network!
  • Discussions should be done on Piazza.


Thanks to Robert T. Morris, MIT and Mythilli Vutukuru, IITB; parts of this course have been inspired by courses made available by them.


Audit criteria

30% or more marks.


We will employ various methods to catch cheating. Cheating in labs/assignments will bring zero in that lab/assignment.

Late policy

  • To help you cope with unexpected emergencies, you can hand in your Labs solutions late, but the total amount of lateness summed over all the lab deadlines must not exceed 72 hours. You can divide up your 72 hours among the labs however you like; you don’t have to ask or tell us. You can only use late hours only for Labs.
  • Assignments can not be submitted late. 1 assignment in the course can be skipped without penalty.
  • COVID addendum: In case you’re affected with an illness, including COVID-19, you can send upto 1 assignment late by 1 week and upto 1 lab late by 1 week by emailing Soumen. Please attach a proof of illness in the email. This can only be used once in the semester and does not affect the other late policy. In other words, in addition to the 1 1-week late assignment, another assignment can be skipped without penalty. Similarly, 3-day extension can be used for the other two labs.

Tentative topics

  • Virtualization: containers, orchestration, hypervisors
  • Recoverability: journaling, snapshotting
  • Fault tolerance: state transfer, replicated state machines
  • Consistency and availability: PACELC theorem
  • Storage Scalability: sharding, consistent hashing
  • Cloud programming: dataflow computation, pub-sub, locking, transactions
  • Light coverage of other topics: cloud economics, public cloud offerings, security

While discussing these topics, we plan to study popular cloud offerings: containers such as docker, orchestration in k8s, key-value stores such as Redis, co-ordination service such as Zookeeper, SQL/NoSQL databases, distributed file systems such as HDFS, pub-sub system Kafka, and dataflow computation in Spark.

Disclaimer: Actual course contents may differ slightly depending on student interest. Reach out to the instructor as soon as possible if there is a particular interest in a topic.

Tentative Schedule

Week Monday Thursday Sunday
1 3 Jan
LEC 1: Introduction.
6 Jan
LEC 2: What is scalability? Task DAGs.
Ch.5 of Introduction to Parallel Computing
2 10 Jan
LEC 3: Fault-tolerant embarrasingly parallel programs.
13 Jan
LEC 4: Work pool model. Introduce Lab 1.
Celery Optional: Celery at Instagram
3 17 Jan
LEC 5: Struggles with Distributed shared memory.
DSM survey.
20 Jan
LEC 6: Resilient Distributed Datasets.
23 Jan
Lab 1 DUE
4 24 Jan
LEC 7: Streaming computation as mini-batches.
Spark streaming.
27 Jan
LEC 8: Real-time stateful streaming (Flink). Introduce Lab 2.
Lightweight Asynchronous Snapshots. Redis streams.
5 31 Jan
LEC 9: Large-scale ML.
3 Feb
LEC 10: Google file system.
6 Feb
Lab 2 DUE
6 7 Feb
LEC 11: Revisit cycles in real-time stateful streaming. Introduce projects.
Lightweight Asynchronous Snapshots
10 Feb
LEC 12: Amazon Dynamo: Decentralization.
Dynamo, Gossip protocol in cassandra
7 14 Feb
17 Feb
20 Feb
Project proposal DUE
8 21 Feb
LEC 13: Amazon Dynamo: Eventual consistency. Introduce Lab 3.
Dynamo, CRDT
24 Feb
LEC 14: Replicated state machines, leader election in Raft.
9 28 Feb
Semester break
3 Mar
LEC 15: Other safety properties in Raft. Linearizability.
6 Mar
Lab 3 DUE
10 7 Mar
LEC 16: Improve read throughput, give up on linearizability of reads. Zookeeper
10 Mar
LEC 17: Distributed transactions. Serializability, 2-phase commit.
11 14 Mar
LEC 18: OS background for virtualization. OS book
17 Mar
LEC 19: Popek-Goldberg theorem. CPU/memory Paravirtualization in Xen.
12 21 Mar
Instructor affected by viral. Makeup class on Apr 2.
24 Mar
LEC 20: I/O virtualization. Parts of VMWare paper
27 Mar
Project DUE
13 28 Mar
Project presentations
Self-study: Containers: Lec 11
31 Mar
Project presentations
Self-study: Containers: Lec 11
2 Apr
LEC 21: Hardware assisted virtualization. KVM Nested paging
Encouraging student comments after the course
I found this course to be very thought provoking and knowledgeable. I feel it has given me a wide variety of knowledge in the systems domain. All the topics covered in class were presented in a very connected manner which really helped grasp the bigger picture and also with the retention of knowledge. Overall, I quite liked the course and contents covered here.

Thanks for organising such a great course where we learn many new concepts which are never taught anywhere we just see the implementations of these concepts in real world. I had never seen such transparency in rubrics, grading in any course. I enjoyed and learned at the same time in entire duration of course.

It was a very good course. Had a lot of fun learning about distributed systems.