In this work we propose a new FPGA-based architecture able to speed up the Lucy-Richardson algorithm (LRA) for space-variant image deconvolution. The architecture exploits the possibility to distribute data among different memory blocks in the FPGA. In this way, the algorithm execution is split into several channels operating in parallel. Since the LRA is implemented via an iterative, space variant convolution, the strategies adopted in this paper can be exploited in other similar image processing algorithms