This course gives an intensive survey of faults and mitigation techniques in modern computing systems. Topics include fault mechanics, modeling, diagnosis, intra-core robustness, communication resilience, and current research in the field. The course focuses on hardware fault-tolerance; software will also be briefly addressed. Research is emphasized, via paper reading/discussion and student projects.