This thesis discusses and presents some developments toward new data services within the EU NFFAEUROPE
project. The work performed originates by the need to rationalize and organize large scientific data-sets using a FAIR approach. The activity leverages on results obtained in previous MHPC work and tackle some of the issues about FAIR principle that are coming out due to an increase in size of variety of the original datasets. More specifically the overall goal of the thesis is to setup well organized data services to manage all the SEM images coming from different sources and partner within the NFFA-EUROPE project. The specific goals within this thesis are the following; • Creation of python application to collect and enrich metadata for SEM images coming from
different sources. • Develop a massive parallel processing approach to be able to reduce time in collecting metadata
on a large amount of images. • Plan and develop of an easy to setup and portable computational ecosystem to accomplish the above goal based on Kubernetes and Spark, with the idea to easily deploy in on different
computational infrastructure. • Measure performance on different computational infrastructure of the massive data processing.