Addressing Power Efficiency Challenges in AI Hardware Through Verification

International Journal of Sustainability and Innovation in Engineering (IJSIE)
2024

https://www.doi.org/10.56830/IJSIE202401

Author

Vikas Nagaraj

Abstract

AI accelerators already run with constrained energy and thermal budgets, and small inefficiencies are amplified across an entire fleet, resulting in increased costs and emissions. This work redefines power efficiency as a checkable requirement and not a back-silicon addition. It specifies power intent in IEEE 1801 (UPF), encodes protocol non-correctness with SystemVerilog/PSL assertions, and quantifies progress with power-state, transition, and cross coverage (DVFS X workload phase X thermal bin). The reproducible dataset schema integrates time with microarchitectural counters, voltage, frequency, temperature, and power, measured across real workloads (ResNet, BERT, and attention/GEMM) in simulation, emulation, and instrumented silicon. Telemetry input is synchronised using triggers and PTP/NTP; rails calibrated and error budgets quoted. The continuous integration gates are merged on quantitative thresholds (e.g., >2% p95 energy/inference regression), and dashboards auto-bisect offending changes. Experiments show a hybrid analytical-plus-ML estimator of 3.8-6.1% MAPE at millisecond latency with 30- 60x emulation throughput compared to simulation and mid-single-digit energy reductions due to verification-driven fixes. Case studies involve preventing standby leakage through restored isolation, smoothing a DVFS table to eliminate 10-15 ms oscillations, and fixing compiler schedules that caused incorrect L2 miss models and increased DRAM data traffic. This yields a realistic, start-to-finish pipeline UPF, ABV/formal, emulation/FPGA, calibrated rigs, and CI to bring watts into the top echelon of test metrics and achieve long-lasting efficiency improvements in GPU/NPU/ASIC accelerators. The full scope includes training and inference across 14-5 nm nodes adhering to rigorous safety, ethics, and licensing practices.

Keywords;

Power-aware verification, Unified Power Format (UPF), Dynamic Voltage and Frequency Scaling (DVFS), AI accelerators, Power-state coverage.  

Download Full Article