CS598 Course Outline
- Introduction
- Course logistics/schedule/projects
- Overview of course goals and topics covered
- Example of the Issues
- Matrix-vector multiply
- Matrix-matrix multiply
- Review of Computer Architecture for Performance Understanding
- Memory Hierarchy
- Instruction Execution
- Parallel Processing
- Realistic Performance Models
- Applied to examples
- Use of upper and lower bounds on performance
- Parallel Computer Architecture for Performance Understanding
- Change to memory hierarchy and model; memory consistency
- Latency
- Concurency and Little's Law
- Interconnect
- Complexity Models
- Adapting Algorithms to Architecture
- Changing order of evaluation (and hence access)
- Adapting Data Structures to Architecture
- Padding
- Alignment
- Recursive mappings
- Interlude: Performance Measurement
- Measurement techniques and hazards
- Statistical techniques
- Reproducibility
- Algorithmic Strategies
- The Cache Oblivious approach
- Algorithmic Strategies con't
- Autotuning and families of algorithms
- Latency Tolerant Algorithms
- Limits on scalability
- Solution 1: Adding hardware
- Solution 2: Removing synchronization
- Solution 3: Changing the problem
- Programming Language Support for Algorithsm
- Historical overview
- Partitioned Global Address Space languages
- The DARPA High Productivity Computing Systems languages
- Programming Models for SMPs
- OpenMP
- Threads
- Evaluations of Some Programming Models
-
- Special Purpose Languages
- Domain Specific languages
- Performance Transformation languages
- Multicore issues for Algorithms
- High local bandwidth
- Low (relative) off-chip bandwidth
- Threads for latency hiding
- Challenges at Extreme Scale
- Concurrency
- Power