CS 6210
Advanced Operating Systems

Fall 2005

UPDATE: the final exam info below has more information.

Course Description

CS 6210 (Advanced Operating Systems) is a graduate level course that covers in detail many advanced topics in operating system design and implementation. It starts with topics such as operating systems structuring, multithreading and synchronization and then moves on to systems issues in parallel and distributed computing systems. There is no textbook for this course. Rather, we will read and discuss a number of important research papers which have been published. For each paper that is covered in class , students are expected to gain a solid understanding of the problem that is addressed by the paper, and the solution proposed by the authors. Some papers will be assigned for self study. Students must carefully read the self study papers because the understanding of their content may be essential for the papers that will be covered in class. Some papers are marked reference only. These papers will cover topics that extend or supplement the material in papers that are covered in class. Students will be expected to have some understanding of the results in these papers but will not be tested on them.

Prerequisites

Grading

10% class participation
35% projects
25% midterm
30% final

Note that a passing grade is required in each of the above components in order to pass the class

Additional Material

Greg Eisenhauer's presentation on "Enabling Scalable Performance "  - 09/15/2004
Patrick Widener's presentation on " Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling "
Patrick Widener's presentation on " Cluster Based Scalable Network Services"
Mohammad mansour's Presentation on J2EE

Projects

This course is project intensive and will have a sequence of four projects. Strong programming skills are absolutely essential for completing these projects. Students can either do the assigned projects or they can choose to define a project that fits more closely with their individual research goals. Check here for updates!.
For more information on the special projects, please follow this link .
IMPORTANT : The writeup for the special projects people is due on Tuesday, Sept 21

For this semester, you will have quota allocated in the directory [ /net/hc280/class/cs6210/~coc_account]. Use it however you like, though do remember to grab anything you'd like to save at the end of the semester.

Project 1 Writeup  :  Date of Submission – 09/10/2004
Project 2 Writeup  :  Date of Submission - 10/15/2004
Project 3 Writeup  :  Date of Submission - 11/22/2004

Syllabus

Assume that one paper will be covered in each class period, starting with the SPIN paper. The instructor will confirm the "next paper" at the end of each class period -- if he forgets, please remind him to clarify.

Papers or references that are not available online will be handed out by instructor, or will be available outside instructors door. Optional supplimentary reference texts include the following:

Basics

  1. Course overview and assumptions, which include basics of operating system structure, micro-kernels, user- and kernel-level threads, synchronization, deadlock detection and avoidance. Refer to Operating System Concepts, Silberschatz and Galvin, and Multithreaded Programming with Pthreads, Chapter 4 (handout).

OS Structures

  1. Brian Bershad et al., " Extensibility, Safety and Performance in the SPIN Operating System ", Proceedings of the 15th ACM Symposium on Operating System Principles, December 1995.
  2. Dawson R. Engler, Frans Kaashoek and James O'Toole, "Exokernel: An Operating System Architecture for Application-Level Resource Management ", Proceedings of the 15th ACM Symposium on Operating System Principles, ACM, December 1995.
  3. J. Liedtke, " On Micro-Kernel Construction ", Proceedings of the 15th ACM Symposium on Operating System Principles, ACM, December 1995.
  4. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield, "Xen and the Art of Virtualization ", SOSP 2003.

Shared Memory Systems

  1. Anderson, T.E., " The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors ", IEEE Transactions on Parallel and Distributed Systems, 1, 1, pgs. 6-16, January 1990. (self study)
  2. Mellor-Crummey, J. M. and Scott, M., "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors ", ACM Transactions on Computer Systems, Feb. 1991.
  3. M.S. Squillante and E.D. Lazowska, " Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling ", IEEE Transactions on Parallel and Distributed Systems, Feb. 1993, pgs. 131-143.
  4. Ben Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm, Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System , 1999 Symposium on Operating System  Design and Implementation.
  5. Jonathan Appavoo, Marc Auslander, Dilma Da Silva, Orran Krieger, et al., Enabling Scalable Performance for General Purpose Workloads on Shared Memory Multiprocessors , IBM Technical Report, 2003.

From Parallel to Distributed Systems: Communication Mechanisms

  1. Basics on message passing and communication protocols. Refer to Operating System Concepts, Silberschatz and Galvin. Also refer to the web pages of the CoC networking courses.
  2. Birrell and Nelson, " Implementing Remote Procedure Calls ", ACM Transactions on Computer Systems, 2, 1, pgs. 39-59, February 1984. (self study). Also refer to Operating System Concepts , Silberschatz and Galvin.
  3. Schroeder, M., and Burrows, M., " Performance of the Firefly RPC ", Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pgs. 83-90, December 1989. (mostly self study)
  4. B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight remote procedure call . ACM Transactions on Computer Systems, 8(1):37--55, Feb. 1990.
  5. User-level RPC (self-study).

High Performance Communications

  1. C.A. Thekkath and H.M. Levy, " Limits to Low-Latency Communications on High-Speed Networks ", ACM Transactions on Computer Systems, May 1993.
  2. Fast Messages (FM): Efficient, Portable Communication for Workstation Clusters and Massively-Parallel Processors .IEEE Concurrency, vol. 5, no. 2, April-June 1997, pp. 60-73. (Pakin, Karamcheti & Chien) ( Self Study... )
  3. Marcel-Catalin Rosu, Karsten Schwan, and Richard Fujimoto," Supporting Parallel Applications on Clusters of Workstations ", Cluster Computing, Baltzer Science Publishers, May 1998. ( Reference Only )
  4. Hutchinson N.C., Peterson, L.L., " The x-Kernel: An Architecture for Implementing Network Protocols ", IEEE Transactions on Software Engineering, 17, 1, pgs. 64-76, January 1991.
  5. David Wetherall, " Active Networks: Vision and Reality: Lessons from a Capsule-based System ", 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999. ( PPT Spring 04 )
  6. Liu, Kreitz, van Renesse, Hickey, Hayden, Birman, Constable, "Building Reliable High Performance Communication Systems from Components ", 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999.

Midterm Exam

The midterm is on October 8. The midterm will cover material that has been discussed through the last week before the test; thus, papers from the week of the midterm are not included. The midterm will cover papers marked "self-study", but it will not explicitly ask anything about materials marked "reference only".

Here are some example midterms:

Distributed Systems: Concepts

  1. Ricart, G. and Agrawala, A.K., " An Optimal Algorithm for Mutual Exclusion in Computer Networks ", Communication of the ACM, 24, 1, pgs. 9-17, January 1981.
  2. Lamport, L., " Time, Clocks, and the Ordering of Events in a Distributed System ", Communications of the ACM, 21, 7, pgs. 558-565, July 1978.

Distributed Systems: File Systems and Distributed Shared Memory

  1. SUN NFS, Locus, and Sprite - from Operating System Concepts , Silberschatz and Galvin. (self study).
  2. Nelson, M.N., Wlech, B.B., Ousterhout, J.K., "Caching in the Sprite Network File System ", ACM Transactions on Computer Systems, 6, 1, pgs. 134-154, February 1988. (self study)
  3. Anderson, T. et all., " Serverless Network File System ", ACM Transpaction on Computer Systems, February 1996.
  4. M. Satyanarayanan, " Integrating Security in Large Scale Distributed Systems ", ACM TOCS, Aug. 1989.
  5. Feeley, Morgan, Pighin, Karlin, Levy, Thekkath,, "Implementing Global Memory Management in a Workstation Cluster ", Fifteenth ACM Symposium on Operating System Principles, Dec. 1995.
  6. C. Amza, A. Cox, S Dwarkadas, P Keleher, H Lu, R. Rajamony, W. Yu and W. Zwaenepoel, " TreadMarks: Shared Memory Computing on Networks of Workstations " IEEE Computer, February, 1996. (skipped)

Multimedia, Real-Time, and Web Services

  1. D. James Gemmell, Harrick M. Vin, Dilip D. Kandlur, P. Venkat Rangan, and Lawrence A. Rowe, " Multimedia Storage Servers: A Tutorial ", IEEE Computer, May 1995. (reference only)
  2. Shahabi, Zimmermann, Fu, and Yao. " Yima: A Second-Generation Continuous Media Server ", IEEE Computer Magazine, June 2002.
  3. Bolosky, Fitzgerald, and Douceur. " Distributed Schedule Management in the Tiger Video Fileserver ", In Proceedings of the 16th ACM Symposium on Operating Systems Principles, Oct. 1997.(skipped)
  4. Michael B. Jones, Daniela Rosu and Marcel Rosu, "CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities ", Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), St. Malo, France, Oct., 1997.
  5. Saito, Bershad, Levy, " Manageability, Availability, and Performance in Porcupine: A Highly Scalable Cluster-based Mail Service ", 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999.
  6. Armando Fox, Steven Gribble, Yatin Chawathe, Eric Brewer, and Paul Gauthier, " Cluster-based Scalable Network Services ", Sixteenth ACM Symposium on Operating System Principles, Oct. 1997.

Distributed Systems: Failures, Consistency and Recovery

  1. Walker et all., "The LOCUS Distributed Operating System", Proceedings of the Ninth ACM Symposium on Operaitng Systems Principles, pgs 49-70, December 1983. (self study)
  2. R. Haskin et. al., " Recovery Management in QuickSilver ", ACM Transactions on Computer Systems, February 1988.
  3. Satyanarayanan, M., et al., " Lightweight Recoverable Virtual Memory ", The Proceedings of Fourteenth ACM Symposium on Operating System Principles, pgs. 146-160, December 1993.
  4. David E. Lowell and Peter M. Chen, " Free Transactions With Rio Vista ", Proceedings of the Sixteenth ACM Symposium on Operating System Principles, October 1997. (first two sections only)
  5. J. N. Gray, P. McJones, M. W. Blasgen, R. A. Lorie, T. G. Price, G. R. Putzolu, and I. L. Traiger. " The Recovery Manager of a Data Management System ", ACM Computing Surveys, Vol. 13, No. 2, June 1981, pp. 223-242. ( slides from Gregory Eisenhauer's presentation )

Protection, Object-based Systems and Object Technologies

  1. Linden, T.A., " Operating System Structures to Support Security and Reliable Software ", Computer Surveys, 8, 4, pgs. 409-445, 1976. Also refer to Operating System Concepts, Silberschatz and Galvin, the chapter on protection. (reference only)
  2. Saltzer, J.H., " Protection and the Control of Information Sharing in Multics ", Communications of the ACM, 17, 7, 1974. (reference only)
  3. Cohen, E., and Jefferson, D., " Protection in the HYDRA Operating System ", Proceedings of Fifth ACM Symposium on Operating System Principles, pgs. 141-160, 1975. (handout)
  4. Mitchell, J. G., et al., " An Overview of the Spring System ", Proceedings of Compcon, Feb. 1994.
  5. Hamilton, G., Powell, M.L., and Mitchell, J.J., "Subcontract: A Flexible Base for Distributed Programming ", Proceedings of the Fourteenth ACM SOSP, pgs. 69-79, December 1993. ( PPT Fall03 )
  6. Wollrath, A., Riggs, R., and Waldo, J., "A Distributed Object Model for the Java System ", Usenix Conference on Object Oriented Technologies and Systems, May 1996. (slides from Ada Gavrilovska's presentation )
  7. Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Henri Bal, Thilo Kielmann, Ceriel Jacobs, Rutger Hofman, " Efficient Java RMI for Parallel Programming ", Vrije Universiteit Amsterdam, Faculty of Sciences, March 2000.
  8. Govindaraju, M., Slominski, A., Choppella, V., Bramley, R., Gannon, D., " Requirements for and evaluation of RMI protocols for scientific computing ", Conference on High Performance Networking and Computing, Proceedings of the 2000 Conference on Supercomputing, Dallas, Texas, USA
  9. Aldrich, Dooley, et al., " Providing Easier Access to Remote Objects in Client-Server Systems ", 31th Hawaii International Conference on System Sciences in January, 1998.
  10. Curbera, F., Duftler, M., Khalaf, R., Nagy, W., Mukhi, N., Weerawarana, S., " Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI ", IEEE Internet Computing, Volume: 6 Issue: 2 , March-April 2002, pgs. 86 -93.

Final Exam

The final exam is on Friday, December 10, from 12:30-14:20. Note that the exam is starting late within its time slot; thus, only two hours are available to take the exam.

There is also an early final exam on Wednesday, December 8, from 11:00-1:00. If you have discussed taking the exam early with Dr. Schwan, then meet him at his office (CCB 261) at 11:00 that day.

The exam will cover in detail only papers that have been discussed since the midterm, i.e. the Active Networking paper and later. Papers from before the midterm are covered in terms of the concepts they contain, not any detail.

Example finals: