Different unsupervised learning methods have been applied to single cell RNA sequencing
datasets, aiming to unveil similarities and correlations between cells and groups of cells.
In this thesis, it will be presented a novel theoretical framework based on information entropy to select most informative genes. In order to achieve this goal it was necessary to study data clustering methods that not only could output a meaningful partition of the high-dimension space of the cell dataset, but also that have well-defined clustering parameters. In addition, it focus on methods that perform under low run time complexity and a good scalability profile. As result, it was found a group of marker genes that preserves the clustering structure they are embedded, which biological relevance still under investigation.