Introduction

mapDamage is a Perl script that tracks DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

Download

USAGE

Three main commands can be used: map, merge and plot. A detailed description is presented below:

Program inputs for the map command

SAM file headers (lines starting with @SQ) are not used and may be skipped.
Example:
_72_#CTTGTA     0       chr9    23885627        37      35M     *       0       0       ATATTTACATAGGCCGGGTGCTGTGGCTCATGCCT     aaa`_^a^a]`MGZ\RIHKNT[RWSNVX]\SPWWU     XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0	XO:i:0  XG:i:0  MD:Z:35
_73_#CTTGTA     0       chr5    123689144       37      61M     *       0       0       TAATATTTCTGATTAGCACAGGATCTGCCACGTTGCAACTTCACTATATAATTTTGCATCT   abbbbaaaaa^aaaaX`ba`OL\`a]L\aa[GXZJR__W^]V^\`a```_a^`TEDR\]XX
   XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:54G6
_74_#CTTGTA     16      chr13   21801111        37      69M     *       0       0       TACAGAAAAATTGTCCAAAATTTGCAATACTACAGAGTCCTCATATTCCCCCCACCCAGGTTCCCCTGA   _TQ][[\^\^_a^\FS__``_`\VGX^aYV_TT^``a]RY`Z^a_`_QPT]P_WZSYaa_a`XX[Z`aa
   XT:A:U  NM:i:0  X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:69

Outputs from the map command

For data mapped against the human genome (24 chromosomes + mitochondrial), one can expect 177 files in the output folder. As for disk space, at a glance, 1.2 millions of reads generates 236 Mo of output files that can be easily reduced to less than 35 Mo as a gzipped tar archive.

Outputs from the merge command

The 7 file categories are merged by chromosomes in seven single files. An additional first column is created in all files that refers to the chromosome considers, except in hitPerChrom:

Outputs from the plot command

The plot command uses three files, nuclComp.all.txt, nuclCompReverse.all.txt and dnaFrag.all.txt to produce one R scripts and one pdf containing graphics:

Examples

Samtools and R should be installed on your system prior running both examples. Plus, the human genome (chromosomes 1 to 22 + X, Y, M) should be saved somewhere on you hard drive. The 25 fasta headers must be: >chr1, >chr2 ... >chrY, >chrM. If not, you can download it as fasta files at UCSC genome browser (905 MB). Once hg19 chromosomes downloaded, process the following command lines in a terminal: mapDamage on real datasets
Two SAM files are available for download and correspond to mapping outputs from 2 GAIIx lanes on 2 different DNA libraries amplified either with Phusion or High Fidelty Taq DNA polymerase on a paleo-Eskimo hair DNA extract. The latter sequencing data were NOT included in the original article by Rasmussen et al., 2010. Once downloaded, process the following command lines in a terminal: Then, process the following command lines: Once downloaded, process the following command lines in a terminal: For each example, you should obtain two files: one R script and one pdf file:

mapDamage on simulated datasets
In Computational challenges in the analysis of ancient DNA. Prufer et al. (2010) Genome Biol 11(5): R47 The datasets used can be downloaded here (19.0 MB). After unpacking the archive one can obtained six fasta files, from NSim1F.fna to NSim6F.fna all contain 100768 sequences. Since all sequences possess extra nucleotides at 3'-end in upper case: GTCAGACACGCAACAGGGGATAGGCAAGGCACACAGGGGATAGG, this tag was removed. The filtered datasets and the SAM files obtained after mapping against the human genome are available here (30.6 MB)
Then use mapDamage to look at the damage patterns: Repeat the three commands for NSim2F-q25.sam to NSim6F-q25.sam.
For this example, you should obtain six pdf files:

Citation

If you use this script, please cite the following publication: Ginolhac A, Rasmussen M, Gilbert MT, Willerslev E, Orlando L. 2011
mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics. 2011 27(15):2153-5

Limitations

The positions used in fragmentation files are not compatible with read length > 99999 nucleotides. This is compatible with all sequencing technologies currently available.

Contact

Please report bugs and suggest possible improvements to Aurelien Ginolhac by email.