Background:
The drop of Whole Exome Sequencing (WES) and Whole Genome Sequencing
(WGS) prices has started a race toward the generation of denser and more accurate
maps of the human genome, but even with the contribute of huge projects as UK10K
(The UK10K Consortium, 2015), the resources currently available for Genome Wide
Association Studies (GWAS) in terms of sample size and power to detect associations,
outdo the ones available for Whole Genome rare variants analyses (e.g UKB (Sudlow
et al., 2015) , GIANT (Speliotes et al., 2010) etc. ). GWAS analysis is still the
most used tool to date to discover correlations between genotypes and phenotypes also
due to the development of imputation algorithms which allow to infer missing geno-
types in a sample using a scaffold of known haplotypes(Marchini and Howie, 2010). The
release of the 1000 Genomes project data (1000 Genomes Project Consortium et al.,
2012) allowed the creation of a reference panel which comprises population from dif-
ferent ancestry based on Next generation Sequencing data (Howie et al., 2011): this
initial resource proved to be extremely valuable for the scientific community and has
been recently updated (Sudmant et al., 2015). Moreover, this showed how useful could
be to include WGS data belonging to the population in study in a reference panel for
imputation (Sidore et al., 2015). To date the rush for the ‘best panel’ is still open and
many collaborations are arising based on data sharing to provide a ‘state of the art’
resource (McCarthy et al., 2016).
Research aims:
With this work we aim to create a resource which can be used as a tool to improve
imputation quality and increase the statistical power of the Italian Network of Genetic
Isolates (INGI) cohorts and, at the same time, which will provide us data to have a
better insight of the structure and peculiar characteristics of our cohorts compared with
outbred populations.
Methods:
We generated low-coverage WGS data for ∼ 1000 samples belonging to three different
INGI cohorts Carlantino (CARL), Friuli Venezia Giulia (FVG) and Val Borbera (VBI)
and after a characterization of this data we will proceed with the description of the
generation of a reference panel for the imputation which includes both the INGI and
the 1000Genomes project phase 3 data.