With the increasing demand of computational capability at low-power, graphic accelerators are today dominating not only the gaming market but also equipping the majority of the most powerful supercomputer infrastructures worldwide. On the other hand, the enabling on those accelerated systems of complex scientific applications has not followed the technological shift, but
resulted particularly challenging and requiring significant effort in writing codes on specific and, usually not portable, languages.
Refactoring large code-base applications requires man power and usually is not portable and prone to errors. Among the several attempts to develop directive based languages to port applications for heterogenous systems, OpenACC has become among the most promising paradigms. This work is an attempt to port on a world-class multi-GPU distributed hybrid system, as the Marconi-100 hosted at CINECA, a production-ready, single-relaxation time, multi-component Lattice-Boltzamann Methods (LBM) based application, using OpenACC. It includes a detailed performance analysis of the application, also with a comparison regards previous results obtained on computer platforms equipped with x64-86 based Intel computer platform for high-end computing