Accéder directement au contenu Accéder directement à la navigation
Pré-publication, Document de travail

A statistical approach for inferring the three-dimensional structure of the genome

Abstract : Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate three dimensional models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely upon \emph{multidimensional scaling} (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, thereby leading to incorrect structure reconstruction. We propose a novel approach to infer a consensus three-dimensional structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms---two metric MDS methods using different stress functions, a nonmetric version of MDS, and ChromSDE, a recently described, advanced MDS method---on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions.
Liste complète des métadonnées

Littérature citée [26 références]  Voir  Masquer  Télécharger

https://hal-mines-paristech.archives-ouvertes.fr/hal-00937182
Contributeur : Jean-Philippe Vert <>
Soumis le : mardi 28 janvier 2014 - 00:44:14
Dernière modification le : jeudi 24 septembre 2020 - 17:06:02
Archivage à long terme le : : dimanche 9 avril 2017 - 00:53:18

Fichier

techreport.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00937182, version 1

Citation

Nelle Varoquaux, Ferhat Ay, William Noble, Jean-Philippe Vert. A statistical approach for inferring the three-dimensional structure of the genome. 2014. ⟨hal-00937182⟩

Partager