A statistical approach for inferring the three-dimensional structure of the genome - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année :

A statistical approach for inferring the three-dimensional structure of the genome

(1, 2) , (3) , (3, 4) , (1, 2)
1
2
3
4

Résumé

Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely upon \emph{multidimensional scaling} (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, thereby leading to incorrect structure reconstruction. We propose a novel approach to infer a consensus three-dimensional structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms---two metric MDS methods using different stress functions, a nonmetric version of MDS, and ChromSDE, a recently described, advanced MDS method---on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions.
Fichier principal
Vignette du fichier
techreport.pdf (2.73 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00937182 , version 1 (28-01-2014)

Identifiants

  • HAL Id : hal-00937182 , version 1

Citer

Nelle Varoquaux, Ferhat Ay, William Stafford Noble, Jean-Philippe Vert. A statistical approach for inferring the three-dimensional structure of the genome. 2014. ⟨hal-00937182⟩
847 Consultations
582 Téléchargements

Partager

Gmail Facebook Twitter LinkedIn More