Pre-stack depth migration (PSDM) is a computationally intensive algorithm widely used in seismic
imaging to accurately position subsurface reflectors. Its high computational cost, largely dominated by
deeply nested loops and irregular memory access patterns, makes performance optimization essential
for large-scale seismic processing. This thesis investigates performance improvements of a PSDM
kernel through enhanced vectorization and pure OpenMP-based parallelization.
Initially, the code was analyzed to identify opportunities for improving compiler auto-vectorization
within the most computationally demanding loops. Loop restructuring, reduction of data
dependencies, and improved memory access patterns were applied to increase vectorization efficiency
and better utilize modern CPU vector units. However, due to varying loop bounds across iterations,
relying solely on SIMD optimization limits both workload balance and overall scalability. In such
cases, combining outer-loop parallelism with inner-loop compiler auto-vectorization typically provides
better control over workload distribution and vectorization efficiency.
Building on this observation, the optimized OpenMP implementation parallelizes higher-level loops
while preserving the previously improved vectorized kernels. Special attention was given to workload
distribution, scheduling strategies, and memory access patterns in order to minimize thread contention
and maximize processor utilization. To evaluate the effectiveness of the proposed approach, both
strong scaling and weak scaling studies were conducted to analyze performance behavior as the number
of processing cores and problem size vary.
In addition to performance analysis, energy consumption considerations were incorporated by
monitoring the energy usage of the OpenMP-parallelized implementation, allowing an assessment of
the trade-offs between performance gains and energy efficiency on multi-core systems. Experimental
results demonstrate that the optimized implementation achieves significant performance improvements
over the baseline version while maintaining favorable energy characteristics. Overall, the findings
highlight that combining improved vectorization with scalable thread-level parallelism and careful
performance analysis can substantially accelerate PSDM workloads on modern high-performance
computing architectures.