The goal of this paper is to provide an overview of the methods that allow text representations with a focus on embeddings for text of different lengths, specifically on works that go beyond word embeddings. Analyzing pieces of text can be more challenging in comparison to the analysis of single words, because several additional factors come into play. For this reason, representations of longer pieces of text can be obtained with different strategies, leveraging additional information with respect to what is done for single words. A text is defined by its components and how these are combined together, and this should be taken into account when integrating information to obtain a single document embedding. In addition, multimodal approaches are described to show how it is possible to fuse information of different nature (aural, visual and knowledge) in order to obtain enriched representations. The aim of this survey is to help navigate through the existing methods proposed in the literature and understand which strategies are most suitable to specific needs.