CS598 Course Outline

  1. Introduction
    1. Course logistics/schedule/projects
    2. Overview of course goals and topics covered
  2. Example of the Issues
    1. Matrix-vector multiply
    2. Matrix-matrix multiply
  3. Review of Computer Architecture for Performance Understanding
    1. Memory Hierarchy
    2. Instruction Execution
    3. Parallel Processing
  4. Realistic Performance Models
    1. Applied to examples
    2. Use of upper and lower bounds on performance
  5. Parallel Computer Architecture for Performance Understanding
    1. Change to memory hierarchy and model; memory consistency
    2. Latency
    3. Concurency and Little's Law
    4. Interconnect
    5. Complexity Models
  6. Adapting Algorithms to Architecture
    1. Changing order of evaluation (and hence access)
  7. Adapting Data Structures to Architecture
    1. Padding
    2. Alignment
    3. Recursive mappings
  8. Interlude: Performance Measurement
    1. Measurement techniques and hazards
    2. Statistical techniques
    3. Reproducibility
  9. Algorithmic Strategies
    1. The Cache Oblivious approach
  10. Algorithmic Strategies con't
    1. Autotuning and families of algorithms
  11. Latency Tolerant Algorithms
    1. Limits on scalability
    2. Solution 1: Adding hardware
    3. Solution 2: Removing synchronization
    4. Solution 3: Changing the problem
  12. Programming Language Support for Algorithsm
    1. Historical overview
    2. Partitioned Global Address Space languages
    3. The DARPA High Productivity Computing Systems languages
  13. Programming Models for SMPs
    1. OpenMP
    2. Threads
  14. Evaluations of Some Programming Models
  15. Special Purpose Languages
    1. Domain Specific languages
    2. Performance Transformation languages
  16. Multicore issues for Algorithms
    1. High local bandwidth
    2. Low (relative) off-chip bandwidth
    3. Threads for latency hiding
  17. Challenges at Extreme Scale
    1. Concurrency
    2. Power