The transformer architecture and its attention-based modules have become quite popular recently and are used for solving most computer vision tasks. However, there have been attempts to explore whether other modules can perform equally well with lower computational costs. In this paper, we introduce a nonlinear convolution structure composed of learnable polynomial and Fourier features, which allows better spectral representation with fewer parameters. The solution we propose is in principle feasible for many CNN application fields, and we present its theoretical motivation. Next, to demonstrate the performance of our architecture, and we exploit it for a paradigmatic task: image translation in driving-related scenarios such as deraining, dehazing, dark-to-bright, and night-to-day transformations. We use specific benchmark datasets for each task and standard quality parameters. The results show that our network provides acceptable or better performances when compared to transformer-based architectures, with a major reduction in the network size due to the use of such a nonlinear convolution block.