Logo del repository
  1. Home
 
Opzioni

SUSTAINABLE COMPUTING IN THE AI ERA: ENERGY PROFILING AND RECONFIGURABLE HIGH-PERFORMANCE COMPUTING

LEON VEGA, LUIS GERARDO
  • doctoral thesis

Abstract
The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.
The surge of Artificial Intelligence (AI) and Deep Learning (DL) workloads has transformed high-performance computing (HPC), increasing demands for both computational power and energy efficiency. This dissertation addresses two key challenges in sustainable computing: energy accounting in next-generation supercomputers and the design of energy-efficient AI accelerators based on Field Programmable Gate Arrays (FPGAs). First, a comprehensive methodology for fine-grained process-level energy consumption estimation is proposed. A novel tool, EfiMon, is introduced to monitor system-wide and process-specific energy metrics without requiring execution isolation, enabling accurate energy profiling on CPUs and GPUs. New analytical models are developed, demonstrating sub-2% relative error for CPU-based measurements and under 10\% error for GPU-based measurements, providing valuable insights into energy usage in shared-resource environments. Second, this work presents the design and evaluation of a Flexible Accelerator Library (FAL), which enables the automatic generation of parameterised FPGA-based AI accelerators. This library supports customising operand size, numerical precision, approximate arithmetic injection, and accelerator reuse. Experimental validation explores standard, Strassen, and Winograd matrix multiplication approaches, assessing trade-offs among resource consumption, performance, and error resilience. Furthermore, approximate computing techniques are incorporated to reduce FPGA resource usage with minimal impact on model accuracy. For MobileNet v2, the resource reduction was approximately 20%, accompanied by an accuracy improvement of 16.6% due to healthy numerical disturbances in the softmax layer. For LeNet 5, it was 18.93% and 9.6% respectively. The thesis extends its impact by exploring FPGA acceleration for Large Language Models (LLMs), proposing architectures optimised for LLM inference at the edge, and discussing pathways for future AI computing architectures that prioritise energy efficiency, scalability, and reconfigurability. This research achieves a speedup ranging from 1.37x to 10.98x over two AMD EPYC 7H12 CPUs with 64 cores each, outperformed by an NVIDIA Tesla V100 by a factor of 1.66x. Lastly, this work highlights the use of heterogeneous systems equipped with CPUs, GPUs, ASICs, and reconfigurable devices to mitigate high-arithmetic-intensity tasks (compute-bound) and the integration of compute units into memory modules to address low-arithmetic-intensity workloads (memory-bound), which can evolve and adapt to the rapidly changing requirements of AI.
Archivio
https://hdl.handle.net/11368/3124122
https://ricerca.unityfvg.it/handle/11368/3124122
Diritti
open access
FVG url
https://arts.units.it/bitstream/11368/3124122/2/PhD_Thesis-3.pdf
Soggetti
  • approximate computin

  • hardware acceleratio

  • fpga

  • machine learning

  • reconfigurable compu

  • Settore INF/01 - Info...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback