Documentation for published script (mapDamage2.0) in Bioinformatics – University of Copenhagen

GeoGenetics > Publications > Documentation for publ...

The idea behind the summary approach in mapDamage2.0. See H. Jónsson et al. 2013 for more details.

mapDamage2.0: Fast approximate Bayesian estimates of ancient DNA damage parameters

Jónsson H., Ginolhac A., Schubert M., Johnson P. and Orlando, L: Bioinformatics (2013) doi: 10.1093/bioinformatics/btt193. First published online: April 23, 2013

Researchers from the Orlando group now release an updated version of their software mapDamage, two years after the publication of the first version.

Damage reactions affect DNA molecules after death and during the fossilization process, leaving a specific signature in the sequences generated by high-throughput sequencing platforms. mapDamage2.0 exploits this signature within a built-in DNA damage model in order to quantify series of key DNA damage parameters and provide information about the average structure of ancient DNA templates. Even though the posterior distribution of those parameters represents the main output of mapDamage2.0, all features from the previous version are still available. These include well known nucleotide misincorporation and fragmentation patterns that can be used to authenticate sequences as truly ancient and not modern contamination by-products.

The statistical model of DNA damage implemented in mapDamage2.0 opens for many new possible applications. For instance, with accurate DNA damage estimates in hand, we can now study the kinetics of post-mortem DNA degradation over time in different environments. We can also tease apart those substitutions that likely originate from post-mortem degradation and therefore limit their impact in downstream analyses. This significantly improves the analysis of ancient DNA sequences and the quality of ancient genomes where the amount of damage-related artifactual misincorporations often outcompasses the amount of genuine biological mutations.

mapDamage2.0 is available at mapDamage2.0, together with an expedient documentation.

The code is written in Python using pysam, resulting in huge performance gain compared to the previous version. As a result, mapDamage2.0 does not require large RAM and CPU capacities and can analyze millions of sequence data within minutes even on laptop computers. Of note, mapDamage2.0 is compatible with any UNIX-like operating system and with all types of DNA libraries, including the most recent based on single-strand ligation.

Performance and examples of possible applications are presented in a companion article published in Bioinformatics.

Sample E522. 95 % posterior predictive intervals for the substitution frequencies. The solid line is the empirical frequency. The x-axis is the position from the 5" end of the template. The y-axis is the substitution frequency. See Schunemann et al. (2011) for more detailed information regarding the original data.

Sample E522. The four upper mini-plots show the base frequency outside and in the read (the open grey box corresponds to the read). The bottom plots are the positions' specific substitutions from the 5" (left) and the 3" end (right). See Schuenemann et al. (2011) for more detailed information regarding the original data.

 

Sample E522. The upper two plots are histograms of the read lengths. The lower two plots are the empirical cumulative frequency of C-T and G-A misincorporations, normalized by the first 70 positions. See Schuenemann et al. (2011) for more detailed information regarding the original data.