Speculative Data Distribution in Shared Memory Multiprocessors

This work explores the possibility of using speculation at the directories in a cache coherent non-uniform memory access multiprocessor architecture to improve performance by forwarding data to their destinations before requests are sent. It improves on previous consumer prediction techniques, showing how to construct a predictor that can handle a tradeoff of accuracy and coverage. This dissertation then explores the correct time to perform consumer prediction, and show how a directory protocol…

Author: Leventhal, Sean

Source: University of Maryland

Download This Report

Reference URL: Visit Now

Contents

1 Introduction
1.1 Motivation
1.2 Summary of Work
1.2.1 Consumer Prediction
1.2.2 Migratory Prediction
1.2.3 Timing Prediction
1.2.4 Combined Coherence Prediction
1.3 Research Contributions
2 Background
2.1 Multiprocessor Design
2.2 Memory Sharing Patterns
2.2.1 Migratory Data
2.2.2 Widely Shared Data
2.3 Multiprocessor Memory Coherence
2.4 Coherence Prediction
2.4.1 Inward Coherence Prediction
2.4.2 Outward Coherence Prediction
2.4.3 Administrative Coherence Prediction
2.5 Coverage of Previous Work
2.5.1 Division of Sharing Patterns
2.5.2 Multi-Read To One Writer Sharing
2.5.3 Migratory Data
3 Experimental Methodology and Terminology
3.1 Measuring the Performance of Coherence Prediction
3.1.1 Measuring Prediction Quality
3.1.2 Measuring System Behavior
3.1.3 Predictor Naming Conventions
3.2 Experimental Methodology
3.3 Directory Protocol Implementation
4 Consumer Prediction
4.1 Theoretical Consumer Prediction Behavior
4.1.1 Consumer Predictor Architecture
4.1.1.1 Perceptron Consumer Prediction Design
4.1.1.2 Consumer Predictor Training
4.1.2 Sensitivity and PVP of Consumer Prediction Schemes
4.1.3 Performance of Predictors Acting on Restricted Information
4.2 Structure of A Consumer Predictor
4.2.1 Supporting A Consumer Predictor in A Directory Protocol
4.2.1.1 Coherence and Training Issues
4.2.1.2 Implementation Requirements
4.2.2 Supporting Program Counter Based Techniques
4.2.3 Directory and Cache Changes to Support Consumer Prediction
4.3 Behavior of Consumer Prediction
4.3.1 Measuring the Behavior of a Consumer Predictor
4.3.2 Varying the Predictor Design
4.3.2.1 Runtime Results Across Varying Predictor Designs
4.3.2.2 Message Transmission Results Across Varying Predictor Designs
4.3.2.3 Bandwidth Usage Results Across Varying Predictor Designs
4.3.2.4 Miss Rates Across Varying Predictor Designs
4.3.2.5 Response Time Results Across Varying Predictor Designs
4.3.3 Tying Processor Number Into the Indexing Scheme
4.3.3.1 Writer-Mixing
4.3.3.2 Address-Mixing
4.3.4 Global Consumer Prediction
4.3.5 Variance of Network Latency
4.3.5.1 Runtime Results
4.3.5.2 Effect on Message Transmission
4.3.5.3 Bandwidth Usage
4.3.5.4 Directory Response Time
4.3.6 Relationship Between Benchmark and Predictor Performance
4.4 Tuning the Perceptron
4.4.1 Varying the Perceptron Threshold
4.4.2 Training a Perceptron With Non-Existent Data
4.5 Applying Confidence Estimation to Consumer Prediction
4.5.1 Supporting A Confidence Estimator
4.5.2 Performance of Confidence Estimated Address-Indexed Conumer Prediction
4.6 Summary
5 Migratory Prediction
5.1 Migratory Consumer Predictor Architecture
5.1.1 Predictor Architecture
5.1.2 Protocol Modification
5.2 Address Based Migratory Consumer Predictors
5.2.1 Effects of History Depth on Migratory Prediction
5.2.2 Preventing Lock-In in Migratory Prediction
5.2.2.1 State Additions
5.2.2.2 History Table Modification
5.2.2.3 History Table Corruption
5.2.2.4 Not-Migratory Counters
5.2.3 Results of Lock-In Prevention Mechanisms
5.3 Instruction Based Migratory Consumer Predictors
5.3.1 Naive Instruction-Based Migratory Prediction
5.3.2 Second-Look Instruction-Based Migratory Prediction
5.3.3 Forward-Ahead Second-Look Instruction-Based Migratory Prediction
5.3.4 Second-Look Instruction-Based Migratory Prediction
5.3.4.1 Interprocessor Message Effects of Instruction Based Prediction
5.3.4.2 Bandwidth Effects of Instruction Based Migratory Prediction
5.3.4.3 Directory Latency of Instruction Based Migratory Prediction
5.3.4.4 Miss Rate of Stores With Migratory Prediction in Use
5.3.4.5 Miss Rate of Instruction Based Migratory Prediction
5.4 Indexing a Migratory Predictor With Mixed Information
5.4.1 Address-Mixed Migratory Prediction
5.4.2 CPU-Mixed Migratory Prediction
5.5 Summary
6 Timing Prediction
6.1 Self Invalidation
6.1.1 Changes to the Directory And Cache to Support Self Invalidation
6.1.2 Training The System
6.1.2.1 Confidence Estimator Training
6.1.2.2 External Training Information
6.1.3 Summary of Training Variation
6.2 Indexing Other Predictors with Last Touch Signatures
6.2.1 Consumer Prediction
6.2.2 Migratory Prediction
6.3 Summary
7 Combining Coherence Prediction Mechanisms
7.1 Potential Benefits of Combining Migratory Prediction With Con-sumer Prediction
7.2 Implementing a Joint Consumer/Migratory Predictor
7.3 Address-Indexed Joint Consumer/Migratory Prediction
7.3.1 Instruction-Indexed Joint Consumer/Migratory Prediction
7.4 Summary
8 Conclusions
8.1 Consumer Prediction
8.2 Migratory Prediction
8.3 Timing Prediction
8.4 Combined Prediction
8.5 Future Work
Bibliography

Speculative Data Distribution in Shared Memory Multiprocessors

Leave a Comment