The density matrix renormalization group (DRMG) algorithm, a numerical technique
that has been successfully used for investigating the low energy properties of
one-dimensional (1D) strongly correlated quantum systems, has recently emerged as
an effective tool for studying two-dimensional (2D) systems as well. At the core of
DMRG is a general decimation procedure that allows the systematic truncation of the
Hilbert space leaving only the most relevant basis states. However, studying 2D systems
requires more degrees of freedom and greater computational resources. To address
this computational roadblock, we develop a massively parallel implementation of
the DMRG algorithm that targets a large number of basis states. It relies on parallel
linear algebra libraries that distribute the generation and diagonalization of large sparse
matrices, as these remain to be the most time-consuming steps in DMRG. We tailor
our developed code for efficient performance on two sections of CINECA Marconi, a
class Tier-0 supercomputing infrastructure, and evaluate its performance and scalability
on up to thousands of processors. From the performance analysis we identify some
limitations in scalability and suggest possible ways to rectify them.