Lower-dimensional representations of data play an essential role in modern machine learning. This thesis studies the geometric structure underlying data from the perspective of real algebraic geometry, with the goal of developing testing procedures for a generalized manifold hypothesis, here called the geometric hypothesis. The dissertation is based on the following two works, listed below in chronological order of development.
- Testing Variety Hypothesis (joint with A.~Lerario, P.~Roos Hoefgeest, and M.~Scolamiero). This work studies the geometric hypothesis for real algebraic varieties of bounded degree, allowing singularities and stratified structures. We show that the testing problem can be reduced to a semialgebraic decision problem and derive explicit sample complexity bounds.
- Testing Algebraic Complete Intersections. This work introduces an effective testing procedure for regression by real algebraic complete intersections with controlled geometry. The method yields explicit bounds on sample complexity together with arithmetic complexity estimates for the algorithmic implementation. (Currently in preparation.)
In the thesis, the two works are presented in the opposite order, reflecting a conceptual progression from the regular to the singular setting.