This paper proposes a research program focused on the
design of a model for the recognition, analysis and classification of video
art works and documentations based on their semiotic aspects and audiovisual
content. Focusing on a corpus of art cinema, video art, and performance
art, the theoretical framework involves bringing together semiotics,
film studies, visual studies, and performance studies with the innovative
technologies of computer vision and artificial intelligence. The aim
is to analyze the performance aspect to interpret contextual references
and cultural constructs recorded in artistic contexts, contributing to the
classification and analysis of video art works with complex semiotic characteristics.
Underlying the conceptual framework is the simultaneous use
of a set of technologies, such as pose estimation, facial recognition, object
recognition, motion analysis, audio analysis, and natural language processing,
to improve recognition accuracy and create a large set of labeled
audiovisual data. In addition, the authors propose a prototype application
to explore the primary challenges of such a research project.