BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics

Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer

ASPLOS 2021



Abstract

Hardware performance counters (HPCs) that measure low-level architectural and microarchitectural events provide dynamic contextual information about the state of the system. However, HPC measurements are error-prone due to non determinism (e.g., undercounting due to event multiplexing, or OS interrupt-handling behaviors). In this paper, we present BayesPerf, a system for quantifying uncertainty in HPC measurements by using a domain-driven Bayesian model that captures microarchitectural relationships between HPCs to jointly infer their values as probability distributions. We provide the design and implementation of an accelerator that allows for low-latency and low-power inference of the BayesPerf model for x86 and ppc64 CPUs. BayesPerf reduces the average error in HPC measurements from 40.1% to 7.6% when events are being multiplexed. The value of BayesPerf in real-time decision-making is illustrated with a simple example of scheduling of PCIe transfers.

Citation

@inproceedings{Banerjee2021,
  author = {Banerjee, Subho S. and Jha, Saurabh and Kalbarczyk, Zbigniew and Iyer, Ravishankar K.},
  title = {BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics},
  year = {2021},
  isbn = {9781450383172},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3445814.3446739},
  doi = {10.1145/3445814.3446739},
  booktitle = {Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
  pages = {832–844},
  numpages = {13},
  keywords = {Error Correction, Performance Counter, Accelerator, Sampling Errors, Error Detection, Probabilistic Graphical Model},
  location = {Virtual, USA},
  series = {ASPLOS 2021}
} 

Related Projects

  • Powered by Hugo
  • Last updated 04/25/2021
  • Feed